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This Application is a Continuation-In Part of U.S. Application Serial No. 
08/598,591 filed on February 12, 1996. 

CODING SEQUENCES OF THE HUMAN BRCA1 GENE 

5 

FIELD OF THE INVENTION 

This invention relates to a gene which has been associated with breast and 
ovarian cancer where the gene is found to be mutated. More specifically, this 
invention relates to the three coding sequences of the BRCA1 gene BRCAl(° mi1 )' 
10 BRCAK omi2 ), and BRCAK omi3 )) isolated from human subjects. 

;3 BACKGROUND OF THE INVENTION 

i J It has been estimated that about 5-10% of breast cancer is inherited Rowell, 

*n S., et aL, American Journal of Human Genetics 55:861-865 (1994). Located on 
id chromosome 17, BRCA1 is the first gene identified to be conferring increased risk 
* for breast and ovarian cancer. Miki et aL, Science _26£:66-71 (1994). Mutations in 
?y this "tumor suppressor" gene are thought to account for roughly 45% of 
inherited breast cancer and 80-90% of families with increased risk of early onset 
C3 breast and ovarian cancer. Easton et aL, American Journal of Human Genetics 
2d 52:678-701 (1993). 

Locating one or more mutations in the BRCA1 region of chromosome 17 
provides a promising approach to reducing the high incidence and mortality 
associated with breast and ovarian cancer through the early detection of women 
at high risk. These women, once identified, can be targeted for more aggressive 
25 prevention programs. Screening is carried out by a variety of methods which 
include karyotyping, probe binding and DNA sequencing. 

In DNA sequencing technology, genomic DNA is extracted from whole 
blood and the coding sequences of the BRCA1 gene are amplified. The coding 
sequences might be sequenced completely and the results are compared to the 
30 DNA sequence of the gene. Alternatively, the coding sequence of the sample 
gene may be compared to a panel of known mutations before completely 
sequencing the gene and comparing it to a normal sequence of the gene. 
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If a mutation in the BRCA1 coding sequence is found, it may be possible to 
provide the individual with increased expression of the gene through gene 
transfer therapy. It has been demonstrated that the gene transfer of the BRCA1 
coding sequence into cancer cells inhibits their growth and reduces 
5 tumorigenesis of human cancer cells in nude mice. Jeffrey Holt and his 
colleagues conclude that the product of BRCA1 expression is a secreted tumor 
growth inhibitor, making BRCA1 an ideal gene for gene therapy studies. 
Transduction of only a moderate percentage of tumor cells apparently produces 
enough growth inhibitor to inhibit all tumor cells. Arteaga, CL, and JT Holt 
10 Cancer Research 56: 1098-1103 (1996), Holt, JT et al., Nature Genetics 12: 298-302 
i (1996). 

1 The observation of Holt et al, that the BRCA1 growth inhibitor is a 

J secreted protein leads to the possible use of injection of the growth inhibitor into 
n the area of the tumor for turner suppression. 

H IS The BRCA1 gene is divided into 24 separate exons. Exons 1 and 4 are 

noncoding, in that they are not part of the final functional BRCA1 protein 
J product. The BRCA1 coding sequence spans roughly 5600 base pairs (bp). Each 
7* exon consists of 200-400 bp, except for exon 11 which contains about 3600 bp. To 
J sequence the coding sequence of the BRCA1 gene, each exon is amplified 
20 separately and the resulting PCR products are sequenced in the forward and 
reverse directions. Because exon 11 is so large, we have divided it into twelve 
overlapping PCR fragments of roughly 350 bp each (segments "A" through "L" of 
BRCA1 exon 11). 

Many mutations and polymorphisms have already been reported in the 
25 BRCA1 gene. A world wide web site has been built to facilitate the detection and 
characterization of alterations in breast cancer susceptibility genes. Such 
mutations in BRCA1 can be accessed through the Breast Cancer Information Core 
at: http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became 
publicly available on November 1, 1995. Friend, S. et al. Nature Genetics 11:238, 
30 (1995). 
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The genetics of Breast/Ovarian Cancer Syndrome is autosomal dominant 
with reduced penetrance. In simple terms, this means that the syndrome runs 
through families such that both sexes can be carriers (only women get the disease 
but men can pass it on), all generations will likely have breast/ovarian or both 
5 diseases and sometimes in the same individual, occasionally women carriers 
either die young before they have the time to manifest disease (and yet offspring 
get it) or they never develop breast or ovarian cancer and die of old age (the latter 
people are said to have "reduced penetrance" because they never develop cancer). 
Pedigree analysis and genetic counseling is absolutely essential to the proper 

10 workup of a family prior to any lab work. 

Until now, only a single coding sequence for the BRCA1 gene has been 
available for comparison to patient samples. That sequence is available as 
GenBank Accession Number U14680. There is a need in the art, therefore, to 
have available a coding sequence which is the BRCA1 coding sequence found in 

15 the majority of the population, a "consensus coding sequence", BRCAl( omil ) Seq. 
ID. NO. 1. A consensus coding sequence will make it possible for true mutations 
to be easily identified or differentiated from polymorphisms. Identification of 
mutations of the BRCA1 gene and protein would allow more widespread 
diagnostic screening for hereditary breast and ovarian cancer than is currently 

20 possible. Two additional coding sequences have been isolated and characterize. 
The BRCAlC° mi2 ) SEQ. ID. NO.: 3, and BRCAK°™3) SEQ. ID. NO.:5 coding 
sequences also have utility in diagnosis, gene therapy and in making therapeutic 
BRCA1 protein. 

A coding sequence of the BRCA1 gene which occurs most commonly in 
25 the human gene pool is provided. The most commonly occurring coding 
sequence more accurately reflects the most likely sequence to be found in a 
subject. Use of the coding sequence BRCAK omil ) SEQ. ID. NO.: 1, rather than the 
previously published BRCA1 sequence, will reduce the likelihood of 
misinterpreting a "sequence variation" found in the population (i.e. 
30 polymorphism) with a pathologic "mutation" (i.e. causes disease in the 
individual or puts the individual at a high risk of developing the disease). With 
large interest in breast cancer predisposition testing, misinterpretation is 
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particularly worrisome. People who already have breast cancer are asking the 
clinical question: "is my disease caused by a heritable genetic mutation?" The 
relatives of the those with breast cancer are asking the question: "Am I also a 
carrier of the mutation my relative has? Thus, is my risk increased, and should I 
5 undergo a more aggressive surveillance program." 

SUMMARY OF THE INVENTION 

The present invention is based on the isolation of three coding sequences 
of the BRCA1 gene found in human individuals. 
10 It is an object of the invention to provide the most commonly occurring 

coding sequence of the BRCA1 gene. 

It is another object of this invention to provide two other coding 
sequences of BRCA1 gene. 

It is another object of the invention to provide three protein sequences 
15 coded for by three of the coding sequences of the BRCA1 gene. 

It is another object of the invention to provide a list of the codon pairs 
which occur at each of seven polymorphic points on the BRCA1 gene. 

It is another object of the invention to provide the rates of occurrence for 
the codons. 

20 It is another object of the invention to provide a method wherein BRCA1, 

or parts thereof, is amplified with one or more oligonucleotide primers. 

It is another object of this invention to provide a method of identifying 
individuals who carry no mutation(s) of the BRCA1 coding sequence and 
therefore have no increased genetic susceptibility to breast or ovarian cancer 
25 based on their BRCA1 genes. 

It is another object of this invention to provide a method of identifying a 
mutation leading to an increased genetic susceptibility to breast or ovarian 
cancer. 

There is a need in the art for a sequence of the BRCA1 gene and for the 
30 protein sequence of BRCA1 as well as for an accurate list of codons which occur at 
polymorphic points on a sequence. 
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A person skilled in the art of genetic susceptibility testing will find the present 
invention* useful for: 

a) identifying individuals having a BRCA1 gene with no coding 
mutations, who therefore cannot be said to have an increased 

5 genetic susceptibility to breast or ovarian cancer from their BRCA1 

genes; 

b) avoiding misinterpretation of polymorphisms found in the BRCA1 
gene; 

c) determining the presence of a previously unknown mutation in the 
10 BRCA1 gene. 

d) identifying a mutation which increases the genetic susceptibility t o 
breast or ovarian cancer. 

e) probing a human sample of the BRCA1 gene. 

f) performing gene therapy. 

15 g) for making a functioning tumor growth inhibitor protein coded for 

by one of the BRCAl omi genes. 

BRIEF DESCRIPTION OF THE FIGURE 

As shown in FIGURE 1, the alternative alleles at polymorphic (non-mutation 
20 causing variations) sites along a chromosome can be represented as a "haplotype" 
within a gene such as BRCA1. The BRCAl(° mil ) haplotype is shown in Figure 1 
with dark shading (encompassing the alternative alleles found at nucleotide sites 
2201, 2430, 2731, 3232, 3667, 4427, and 4956). For comparison, the haplotype that is 
in GenBank is shown with no shading. As can be seen from the figure, the 
25 common "consensus" haplotype is found intact in five separate chromosomes 
labeled with the OMI symbol (numbers 1-5 from left to right). Two additional 
haplotypes (BRCAl< omi2 >, and BRCAl( omi3 > are represented with mixed dark and 
light shading (numbers 7 and 9 from left to right). In total, 7 of 10 haplotypes 
along the BRCA1 gene are unique. 

30 
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DETAILED DESCRIPTION OF THE INVENTION 



DEFINITIONS 

The following definitions are provided for the purpose of understanding this 
5 invention. 

"Breast and Ovarian cancer" is understood by those skilled in the art to 
include breast and ovarian cancer in women and also breast and prostate cancer 
in men. BRCA1 is associated genetic susceptibility to inherited breast and 
10 ovarian cancer in women and also breast and prostate cancer in men. Therefore, 
3 claims in this document which recite breast and /or ovarian cancer refer to breast, 
~j ovarian and prostate cancers in men and women. 

n " Coding sequence" or " DNA coding sequence"refers to those portions of a 

;jl5 gene which, taken together, code for a peptide (protein), or which nucleic acid 
itself has function. 

j " Protein" or "peptide" refers to a sequence amino acids which has 

f function. 

20 "BRCAK omi >" refers collectively to the "BRCAl( omi1 )", "BRCAl(° mi2 )" and 

"BRCAK omi3 )" coding sequences. 

"BRCAK 0 ™ 1 )" refers to SEQ. ID. NO.: 1, a coding sequence for the BRCA1 
gene. The coding sequence was found by end to end sequencing of BRCA1 alleles 
25 from individuals randomly drawn from a Caucasian population found to have 
no family history of breast or ovarian cancer. The sequenced gene was found not 
to contain any mutations. BRCAl(° mil ) was determined to be a consensus 
sequence by calculating the frequency with which the coding sequence occurred 
among the sample alleles sequenced. 

30 

"BRCAl(° mi2 )" and "BRCAl (omi3) " refer to SEQ. ID. NO.: 3, and SEQ. ID. NO.: 
5 respectively. They are two additional coding sequences for the BRCA1 gene 
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which were also isolated from individuals randomly drawn from a Caucasian 
population found to have no family history of breast or ovarian cancer, 
polymorphisms 

5 "Primer" as used herein refers to a sequence comprising about 20 or more 

nucleotides of the BRCA1 gene. 

"Genetic susceptibility" refers to the susceptibility to breast or ovarian 
cancer due to the presence of a mutation in the BRCA1 gene. 

10 

A "target polynucleotide" refers to the nucleic acid sequence of interest e.g., 
the BRCA1 encoding polynucleotide. Other primers which can be used for 
primer hybridization will be known or readily ascertainable to those of skill in 
the art. 

15 

"Consensus" means the most commonly occurring in the population. 
"Consensus genomic sequence" means the allele of the target gene which 
occurs with the greatest frequency in a population of individuals having no 
family history of disease associated with the target gene. 

20 

"Substantially complementary to" refers to a probe or primer sequences 
which hybridize to the sequences provided under stringent conditions and /or 
sequences having sufficient homology with BRCA1 sequences, such that the 
allele specific oligonucleotide probe or primers hybridize to the BRCA1 sequences 
25 to which they are complimentary. 

"Haplotype" refers to a series of alleles within a gene on a chromosome. 

"Isolated" as used herein refers to substantially free of other nucleic acids, 
30 proteins, lipids, carbohydrates or other materials with which they may be 
associated. Such association is typically either in cellular material or in a 
synthesis medium. 
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"Mutation" refers to a base change or a gain or loss of base pair(s) in a DNA 
sequence, which results in a DNA sequence which codes for a no n- functioning 
protein or a protein with substantially reduced or altered function. 

5 

"Polymorphism" refers to a base change which is not associated with 
known pathology. 

"Tumor growth inhibitor protein" refers to the protein coded for by the 
10 BRCA1 gene. The functional protein is thought to suppress breast and ovarian 
tumor growth. 

^3 The invention in several of its embodiments includes: 

vj 1. An isolated consensus DNA sequence of the BRCA1 coding sequence as set 
forth in SEQ. ID. NO.: 1. 

II 15 

fjj 2. A consensus protein sequence of the BRCA1 protein as set forth in 
u SEQ. ID. NO.: 2. 

y 3. An isolated coding sequence of the BRCA1 gene as set forth in 
;320 SEQ. ID. NO.: 3. 

4. A protein sequence of the BRCA1 protein as set forth in 
SEQ. ID. NO.: 4 . 

25 5. An isolated coding sequence of the BRCA1 gene as set forth in 
SEQ. ID. NO.: 5. 

6. A protein sequence of the BRCA1 protein as set forth in SEQ. ID. NO.: 6. 

30 

7. A BRCA1 gene with a BRCA1 coding sequence not associated with 
breast or ovarian cancer which comprises an, alternative pair of codons, AGC 

8 



and AGT, which occur at position 2201 at frequencies of about 35-45%, and 
from about 55-65%, respectively. 

8. A BRCA1 gene according to Claim 7 wherein AGC occurs at a 
5 frequency of about 40%. 

9. A set of at least two alternative codon pairs which occur at 
polymorphic positions in a BRCA1 gene with a BRCA1 coding sequence not 
associated with breast or ovarian cancer, wherein codon pairs are selected 

10 from the group consisting of: 

• AGC and AGT at position 2201; 

• TTG and CTG at position 2430; 

• CCG and CTG at position 273 1 ; 

• GAA and GGA at position 3232; 
15 • AAA and AGA at position 3667; 

• TCT and TCC at position 4427; and 
AGT and GGT at position 4956. 

10. A set of at least two alternative codon pairs according to claim 9, 
20 wherein the codon pairs occur in the following frequencies, respectively, in a 

population of individuals free of disease: 

• at position 2201, AGC and AGT occur at frequencies from about 
35-45%. and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
25 35-45%, and from about 55-65%, respectively; 

• at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

30 • at position 3667, AAA and AGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

• at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

• at position 4956, AGT and GGT occur at frequencies from about 
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35-45%, and from about 55-65%, respectively. 



11 A set according to Claim 10 which is at least three codon pairs. 



5 12 A set according to Claim 10 which is at least four codon pairs, 
13. A set according to Claim 10 which is at least five codon pairs. 
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14. A set according to Claim 10 which is at least six codon pairs. 



15 A set according to Claim 10 which is at least seven codon pairs. 



16. A method of identifying individuals having a BRCA1 gene with a 
BRCA1 coding sequence not associated with disease, comprising: 
15 (a) amplifying a DNA fragment of an individual's BRCA1 

coding sequence using an oligonucleotide primer which 
specifically hybridizes to sequences within the gene; 
(b) sequencing said amplified DNA fragment by dideoxy 
sequencing; 

20 (c) repeating steps (a) and (b) until said individual's BRCA1 

coding sequence is completely sequenced; 
(d) comparing the sequence of said amplified DNA fragment 
to a BRCAl< omi J DNA sequence, SEQ. ID. NOl, SEQ. ID. 
N03, or SEQ. ID. NOS; 
25 (e) determining the presence or absence of each of the 

following polymorphic variation in said individual's 
BRCA1 coding sequence: 

AGC and AGT at position 2201, 
TTG and CTG at position 2430, 
30 • CCG and CTG at position 2731, 

GAA and GGA at position 3232, 
AAA and AGA at position 3667, 
TCT and TCC at position 4427, and 
AGT and GGT at position 4956; 
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(f) determining any sequence differences between said 
individual's BRCA1 coding sequences and SEQ. ID. NOl, 
SEQ. ID. N03, or SEQ. ID. NOS wherein the presence of 
said polymorphic variations and the absence of a 
variation outside of positions 2201, 2430, 273 1, 3232, 
3667, 4427, and 4956, is correlated with an absence of 
increased genetic susceptibility to breast or ovarian 
cancer resulting from a BRCA1 mutation in the BRCA1 
coding sequence. 

17. A method of claim 16 wherein, codon variations occur at the 
following frequencies, respectively, in a population of individuals free of 
disease: 

• at position 2201, AGC and AGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

• at position 4956, AGT and GGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively. 

18. A method according to claim 16 wherein said oligonucleotide primer is 
labeled with a radiolabel, a fluorescent label a bioluminescent label, a 
chemiluminescent label, or an enzyme label. 

19. A method of detecting a increased genetic susceptibility to breast and 



ovarian cancer In an individual resulting from the presence of a mutation in 
the BRCA1 coding sequence, comprising: 

(a) amplifying a DNA fragment of an individual's BRCA1 
coding sequence using an oligonucleotide primer which 
specifically hybridizes to sequences within the gene; 

(b) sequencing said amplified DNA fragment by dideoxy 
sequencing; 

(c) repeating steps (a) and (b) until said individual's BRCA1 
coding sequence is completely sequenced; 

(d) comparing the sequence of said amplified DNA fragment 
to a BRCAl(onii) DNA sequence, SEQ. ID. NOl, SEQ. ID. 
N03, or SEQ. ID. NOS; 

(e) determining any sequence differences between said 
individual's BRCA1 coding sequences and SEQ. ID. NOl, 
SEQ. ID. N03, or SEQ. ID. NOS; to determine the 
presence or absence of base changes in said individual's 
BRCA1 coding sequence wherein a base change which is 
not any one of the following: 

• AGC and AGT at position 2201, 

• TTG and CTG at position 2430, 
CCG and CTG at position 2731, 

• GAA and GGA at position 3232, 

• AAA and AGA at position 3667, 

• TCT and TCC at position 4427, and 

• AGT and GGT at position 4956 is correlated with 
the potential of increased genetic susceptibility to 
breast or ovarian cancer resulting from a BRCA1 
mutation in the BRCA1 coding sequence. 

20. A method of claim 19 wherein, codon variations occur at the following 
frequencies, respectively, in a population free of disease: 

• at position 220 1 , AGC and AGT occur at frequencies from about 
40%, and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
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35-45%, and from about 55-65%, respectively; 

• at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 

• at position 3232, GAA and GGA occur at frequencies from about 
5 35-45%, and from about 55-65%, respectively; 

• at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 

• at position 4427, TCT and TCC occur at frequencies from about 
45-55%, and from about 45-55%, respectively; and 

10 • at position 4956 t AGT and GGT occur at frequencies from about 

35-45%, and from about 55-65%, respectively. 

21. A method according to claim 19 wherein said oligonucleotide primer is 
labeled with a radiolabel, a fluorescent label a bioluminescent label, a 

15 chemiluminescent label, or an enzyme label. 

22. A set of codon pairs, which occur at polymorphic positions in a 
BRCA1 gene with a BRCA1 coding sequence according to Claim 1, wherein 
said set of codon pairs is: 

20 • AGC and AGT at position 220 1 ; 

TTG and CTG at position 2430; 

• CCG and CTG at position 2731; 

• GAA and GGA at position 3232; 

• AAA and AGA at position 3667; 

25 • TCT and TCC at position 4427; and 

• AGT and GGT at position 4956. 

23. A set of at least two alternative codon pairs according to claim 22 
wherein set of at least two alternative codon pairs occur at the following 

30 frequencies: 

• at position 2201, AGC and AGT occur at frequencies of about 
40%, and from about 55-65%, respectively; 

• at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
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at position 2731, CCG and CTG occur at frequencies from about 

25-35%, and from about 65-75%, respectively; 

at position 3232, GAA and GGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

at position 3667, AAA and AGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 

at position 4427, TCT and TCC occur at frequencies from about 

45-55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 

35-45%, and from about 55-65%, respectively. 



24. A BRCA1 coding sequence according to claim 1 wherein the codon 
pairs occur at the following frequencies: 

at position 2201, AGC and AGT occur at frequencies of about 
15 40%, and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
at position 2731, CCG and CTG occur at frequencies from about 
25-35%, and from about 65-75%, respectively; 
20 • at position 3232, GAA and GGA occur at frequencies from about 

35-45%, and from about 55-65%, respectively; 
at position 3667, AAA and AGA occur at frequencies from about 
35-45%, and from about 55-65%, respectively; 
at position 4427, TCT and TCC occur at frequencies from about 
25 45-55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 
35-45%, and from about 55-65%, respectively. 



25. A method of determining the consensus genomic sequence or consensus 
30 coding sequence for a target gene, comprising: 

a) screening a number of individuals in a population for a family history 
which indicates inheritance of normal alleles for a target gene; 

b) isolating at least one allele of the target gene from individuals found to 
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have a family history which indicates inheritance of normal alleles for a target 
gene; 

c) sequencing each allele; 

d) comparing the nucleic acid sequence of the genomic sequence or of the 
5 coding sequence of each allele of the target gene to determine similarities and 

differences in the nucleic acid sequence; and 

e) determining which allele of the target gene occurs with the greatest 
frequency. 

10 26. A method of performing gene therapy, comprising: 

a) transfecting cancer cell in vivo with an effective amount of a 
vector transformed with a BRCA1 coding sequences of SEQ. 
ID. NO.: 1, SEQ. ID. NO.: 3, or SEQ. ID. NO.: 5; 

b) allowing the cells to take up the vector, and 
15 c) measuring a reduction in tumor growth. 

27. A method of performing protein therapy, comprising: 

a) injecting into a patient, an effective amount of BRCA1 tumor 
growth inhibiting protein of SEQ. ID. NO.: 2, SEQ. ID. NO.: 

20 4, or SEQ. ID. NO.: 6; 

b) allowing the cells to take up the protein, and 

c) measuring a reduction in tumor growth. 

SEQUENCING 

25 Any nucleic acid specimen, in purified or non-purified form, can be 

utilized as the starting nucleic acid or acids, providing it contains, or is suspected 
of containing, the specific nucleic acid sequence containing a polymorphic locus. 
Thus, the process may amplify, for example, DNA or RNA, including messenger 
RNA, wherein DNA or RNA may be single stranded or double stranded. In the 

30 event that RNA is to be used as a template, enzymes, and/or conditions optimal 
for reverse transcribing the template to DNA would be utilized. In addition, a 
DNA-RNA hybrid which contains one strand of each may be utilized. A mixture 
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of nucleic acids may also be employed, or the nucleic acids produced in a 
previous amplification reaction herein, using the same or different primers may 
be so utilized. See TABLE II. The specific nucleic acid sequence to be amplified, 
i.e., the polymorphic locus, may be a fraction of a larger molecule or can be 
present initially as a discrete molecule, so that the specific sequence constitutes 
the entire nucleic acid. It is not necessary that the sequence to be amplified be 
present initially in a pure form; it may be a minor fraction of a complex mixture, 
such as contained in whole human DNA. 

DNA utilized herein may be extracted from a body sample, such as blood, 
tissue material and the like by a variety of techniques such as that described by 
Maniatis, et. al in Molecular CloningiA Laboratory Manual, Cold Spring Harbor, 
NY, p 280-281, 1982). If the extracted sample is impure, it may be treated before 
amplification with an amount of a reagent effective to open the cells, or animal 
cell membranes of the sample, and to expose and /or separate the strand(s) of the 
nucleic acid(s). This lysing and nucleic acid denaturing step to expose and 
separate the strands will allow amplification to occur much more readily. 

The deoxyribonucleotide triphosphates dATP, dCTP, dGTP, and dTTP are 
added to the synthesis mixture, either separately or together with the primers, in 
adequate amounts and the resulting solution is heated to about 90°-100°C from 
about 1 to 10 minutes, preferably from 1 to 4 minutes. After this heating period, 
the solution is allowed to cool, which is preferable for the primer hybridization. 
To the cooled mixture is added an appropriate agent for effecting the primer 
extension reaction (called herein "agent for polymerization ,, ) / and the reaction is 
allowed to occur under conditions known in the art. The agent for 
polymerization may also be added together with the other reagents if it is heat 
stable. This synthesis (or amplification) reaction may occur at room temperature 
up to a temperature above which the agent for polymerization no longer 
functions. Thus, for example, if DNA polymerase is used as the agent, the 
temperature is generally no greater than about 40°C. Most conveniently the 
reaction occurs at room temperature. 

The primers used to carry out this invention embrace oligonucleotides of 
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sufficient length and appropriate sequence to provide initiation of 
polymerization. Environmental conditions conducive to synthesis include the 
presence of nucleoside triphosphates and an agent for polymerization, such as 
DNA polymerase, and a suitable temperature and pH. The primer is preferably 
5 single stranded for maximum efficiency in amplification, but may be double 
stranded. If double stranded, the primer is first treated to separate its strands 
before being used to prepare extension products. The primer must be sufficiently 
long to prime the synthesis of extension products in the presence of the inducing 
agent for polymerization. The exact length of primer will depend on many 

10 factors, including temperature, buffer, and nucleotide composition. The 
oligonucleotide primer typically contains 12-20 or more nucleotides, although it 
may contain fewer nucleotides. 

Primers used to carry out this invention are designed to be substantially 
complementary to each strand of the genomic locus to be amplified. This means 

15 that the primers must be sufficiently complementary to hybridize with their 
respective strands under conditions which allow the agent for polymerization to 
perform. In other words, the primers should have sufficient complementarity 
with the 5' and 3' sequences flanking the mutation to hybridize therewith and 
permit amplification of the genomic locus. 

20 Oligonucleotide primers of the invention are employed in the 

amplification process which is an enzymatic chain reaction that produces 
exponential quantities of polymorphic locus relative to the number of reaction 
steps involved. Typically, one primer is complementary to the negative (-) 
strand of the polymorphic locus and the other is complementary to the positive 

25 (+) strand. Annealing the primers to denatured nucleic acid followed by 
extension with an enzyme, such as the large fragment of DNA polymerase I 
(Klenow) and nucleotides, results in newly synthesized + and - strands 
containing the target polymorphic locus sequence. Because these newly 
synthesized sequences are also templates, repeated cycles of denaturing, primer 

30 annealing, and extension results in exponential production of the region (i.e., the 
target polymorphic locus sequence) defined by the primers. The product of the 
chain reaction is a discreet nucleic acid duplex with termini corresponding to the 
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ends of the specific primers employed. 

The oligonucleotide primers of the invention may be prepared using any 
suitable method, such as conventional phosphotriester and phosphodiester 
methods or automated embodiments thereof. In one such automated 
embodiment, diethylphosphoramidites are used as starting materials and may be 
synthesized as described by Beaucage, et al., Tetrahedron Letters, 22:1859-1862, 
1981. One method for synthesizing oligonucleotides on a modified solid support 
is described in U.S. Patent No. 4,458,066. 



The agent for polymerization may be any compound or system which will 
function to accomplish the synthesis of primer extension products, including 
enzymes. Suitable enzymes for this purpose include, for example, £. coli DNA 
polymerase I, Klenow fragment of E. coli DNA polymerase, polymerase muteins, 
reverse transcriptase, other enzymes, including heat-stable enzymes (e.L, those 
enzymes which perform primer extension after being subjected to temperatures 
sufficiently elevated to cause denaturation), such as Taq polymerase. Suitable 
enzyme will facilitate combination of the nucleotides in the proper manner to 
form the primer extension products which are complementary to each 
polymorphic locus nucleic acid strand. Generally, the synthesis will be initiated 
at the 3' end of each primer and proceed in the 5' direction along the template 
strand, until synthesis terminates, producing molecules of different lengths. 

The newly synthesized strand and its complementary nucleic acid strand 
will form a double-stranded molecule under hybridizing conditions described 
above and this hybrid is used in subsequent steps of the process. In the next step, 
the newly synthesized double-stranded molecule "is subjected to denaturing 
conditions using any of the procedures described above to provide single- 
stranded molecules. 

The steps of denaturing, annealing, and extension product synthesis can be 
repeated as often as needed to amplify the target polymorphic locus nucleic acid 
sequence to the extent necessary for detection. The amount of the specific nucleic 
acid sequence produced will accumulate in an exponential fashion. 
Amplification is described in PCR. A Practical Approach ILR Press, Eds. M. J. 
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McPherson, P. Quirke, and G. R. Taylor, 1992. 

The amplification products may be detected by Southern blots analysis, 
without using radioactive probes. In such a process, for example, a small sample 
of DNA containing a very low level of the nucleic acid sequence of the 
5 polymorphic locus is amplified, and analyzed via a Southern blotting technique 
or similarly, using dot blot analysis. The use of non-radioactive probes or labels 
is facilitated by the high level of the amplified signal. Alternatively, probes used 
to detect the amplified products can be directly or indirectly detectably labeled, for 
example, with a radioisotope, a fluorescent compound, a bioluminescent 

10 compound, a chemiluminescent compound, a metal chelator or an enzyme. 
Those of ordinary skill in the art will know of other suitable labels for binding to 
the probe, or will be able to ascertain such, using routine experimentation. 

Sequences amplified by the methods of the invention can be further 
evaluated, detected, cloned, sequenced, and the like, either in solution or after 

15 binding to a solid support, by any method usually applied to the detection of a 
specific DNA sequence such as PCR, oligomer restriction (Saiki, et.al., 
Bio/Technology ,2:1008-1012, 1985), allele-specific oligonucleotide (ASO) probe 
analysis (Conner, et. at, Proc. Natl Acad. Set U.S.A., 8(1:278, 1983), 
oligonucleotide ligation assays (OLAs) (Landgren, et at, Science,2Al:lQ07, 1988), 

20 and the like. Molecular techniques for DNA analysis have been reviewed 
(Landgren, et al, Science, 242:229-237, 1988). 

Preferably, the method of amplifying is by PCR, as described herein and as 
is commonly used by those of ordinary skill in the art. Alternative methods of 
"amplification have been described and can also be employed as long as the 

25 BRCA1 locus amplified by PCR using primers of the invention is similarly 
amplified by the alternative means. Such alternative amplification systems 
include but are not limited to self-sustained sequence replication, which begins 
with a short sequence of RNA of interest and a T7 promoter. Reverse 
transcriptase copies the RNA into cDNA and degrades the RNA, followed by 

30 reverse transcriptase polymerizing a second strand of DNA. Another nucleic 
acid amplification technique is nucleic acid sequence-based amplification 
(NASBA) which uses reverse transcription and T7 RNA polymerase and 
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incorporates two primers to target its cycling scheme. NASBA can begin with 
either DNA or RNA and finish with either, and amplifies to 10 8 copies within 60 
to 90 minutes. Alternatively, nucleic acid can be amplified by ligation activated 
transcription (LAT). LAT works from a single-stranded template with a single 
5 primer that is partially single-stranded and partially double-stranded. 
Amplification is initiated by ligating a cDNA to the promoter oligonucleotide 
and within a few hours, amplification is 10 8 to 10 9 fold. Another amplification 
system useful in the method of the invention is the QB Replicase System. The 
QB replicase system can be utilized by attaching an RNA sequence called MDV-1 
10 to RNA complementary to a DNA sequence of interest. Upon mixing with a 
sample, the hybrid RNA finds its complement among the specimen's mRNAs 
and binds, activating the replicase to copy the tag-along sequence of interest. 
Another nucleic acid amplification technique, ligase chain reaction (LCR), works 
by using two differently labeled halves of a sequence of interest which are 

15 covalently bonded by ligase in the presence of the contiguous sequence in a 
sample, forming a new target. The repair chain reaction (RCR) nucleic acid 
amplification technique uses two complementary and target-specific 
oligonucleotide probe pairs, thermostable polymerase and ligase, and DNA 
nucleotides to geometrically amplify targeted sequences. A 2-base gap separates 

20 the oligonucleotide probe pairs, and the RCR fills and joins the gap, mimicking 
DNA repair. Nucleic acid amplification by strand displacement activation (SDA) 
utilizes a short primer containing a recognition site for hincll with short 
overhang on the 5' end which binds to target DNA. A DNA polymerase fills in 
the part of the primer opposite the overhang with sulfur-containing adenine 

25 analogs. Hindi is added but only cuts the unmodified DNA strand. A DNA 
polymerase that lacks 5' exonuclease activity enters at the cite of the nick and 
begins to polymerize, displacing the initial primer strand downstream and 
building a new one which serves as more primer. SDA produces greater than 
10 7 -fold amplification in 2 hours at 37°C. Unlike PCR and LCR, SDA does not 

30 require instrumented Temperature cycling. 

Another method is a process for amplifying nucleic acid sequences from a 
DNA or RNA template, which may be purified or may exist in a mixture of 



20 



nucleic acids. The resulting nucleic acid sequences may be exact copies of the 
template, or may be modified. The process has advantages over PCR in that it 
increases the fidelity of copying a specific nucleic acid sequence, and it allows one 
to more efficiently detect a particular point mutation in a single assay. A target 
5 nucleic acid is amplified enzymatically while avoiding strand displacement. 
Three primers are used. A first primer is complementary to the first end of the 
target. A second primer is complementary to the second end of the target. A 
third primer which is similar to the first end of the target and which is 
substantially complementary to at least a portion of the first primer such that 
10 when the third primer is hybridized to the first primer, the position of the third 
primer complementary to the base at the 5' end of the first primer contains a 
i modification which substantially avoids strand displacement. This method is 
J detailed in U.S. Patent 5,593,840 to Bhatnagar et al. 1997. Although PCR is the 
Z preferred method of amplification if the invention, these other methods can also 
;|15 be used to amplify the BRCA1 locus as described in the method of the invention. 
U The BRCAl(° mi ) DNA coding sequences were obtained by end to end 

* sequencing of the BRCA1 alleles of five subjects in the manner described above 
J followed by analysis of the data obtained. The data obtained provided us with the 
J opportunity to evaluate seven previously published polymorphisms and to 
520 affirm or correct where necessary, the frequency of occurrence of alternative 
codons. 

GENE THERAPY 

The coding sequences can be used for gene therapy. 
A variety of methods are known for gene transfer, any of which might be 
25 available for use. 

Direct injection of Recombinant DNA in vivo 

1. Direct injection of "naked" DNA directly with a syringe and needle into a 
specific tissue, infused through a vascular bed, or transferred through a catheter 
into endothelial cells. 
30 2. Direct injection of DNA that is contained in artificially generated lipid 
vesicles. 

3. Direct injection of DNA conjugated to a targeting structure, such as an 
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antibody. 

4. Direct injection by particle bombardment, where the DNA is coated onto 
gold particles and shot into the cells. 



5 Human Artificial Chromosomes 

This novel gene delivery approach involves the use of human chromosomes 
that have been striped down to contain only the essential components for 
replication and the genes desired for transfer. 

10 Receptor-Mediated Gene Transfer 

DNA is linked to a targeting molecule that will bind to specific cell-surface 
receptors, inducing endocytosis and transfer of the DNA into mammalian cells. 
One such technique uses poly-L-lysine to link asialoglycoprotein to DNA. An 
adenovirus is also added to the complex to disrupt the lysosomes and thus allow 

15 the DNA to avoid degradation and move to the nucleus. Infusion of these 
particles intravenously has resulted in gene transfer into hepatocytes. 

RECOMBINANT VIRUS VECTORS 

Several vectors are used in gene therapy. Among them are the Moloney Murine 
20 Leukemia Virus (MoMLV) Vectors, the adenovirus vectors, the adeno- 
Associated Virus (AAV) vectors, the herpes simplex virus (HSV) vectors, the 
poxvirus vectors, and human immunodeficiency virus (HIV) vectors, 

GENE REPLACEMENT AND REPAIR 

25 The ideal genetic manipulation for treatment of a genetic disease would be the 
actual replacement of the defective gene with a normal copy of the gene. 
Homologous recombination is the term used for switching out a section of DNA 
and replacing it with a new piece. By this technique, the defective gene can be 
replaced with a normal gene which expresses a functioning BRCA1 tumor 

30 growth inhibitor protein. 

A complete description of gene therapy can also be found in "Gene Therapy A 
Primer For Physicians 2d Ed. by Kenneth W. Culver, M.D. Publ. Mary Ann 
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Liebert Inc. (1996). Two Gene Therapy Protocols for BRCA1 are approved by the 
Recombinant DNA Advisory Committee for Jeffrey T. Holt et al. They are listed 
as 9602-148, and 9603-149 and are available from the NIH. The isolated BRCA1 
gene can be synthesized or constructed from amplification products and inserted 
5 into a vector such as the LXSN vector. 

The BRCA1 amino acid and nucleic acid sequence may be used to make 
diagnostic probes and antibodies. Labeled diagnostic probes may be used by any 
hybridization method to determine the level of BRCA1 protein in serum or lysed 
cell suspension of a patient, or solid surface cell sample. 
10 The BRCA1 amino acid sequence may be used to provide a level of 

protection for patients against risk of breast or ovarian cancer or to reduce the 
size of a tumor. Methods of making and extracting proteins are well known. 
Itakura et al U.S. Patents 4,704,362, 5, 221, 619, and 5,583,013. BRCA1 has been 
shown to be secreted. Jensen, R.A. et al Nature Genetics 12: 303-308 (1996). 

15 

EXAMPLE 1 

Determination Of The Coding Sequence Of A BRCAli flffli i Gene From Five 
Individuals 

20 MATERIALS AND METHODS 

Approximately 150 volunteers were screened in order to identify 
individuals with no cancer history in their immediate family (i.e. first and 
second degree relatives). Each person was asked to fill out a hereditary cancer 
prescreening questionnaire See TABLE I below. Five of these were randomly 

25 chosen for end-to-end sequencing of their BRCA1 gene. A first degree relative is 
a parent, sibling, or offspring. A second degree relative is an aunt, uncle, 
grandparent, grandchild, niece, nephew, or half-sibling. 
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TABLE I 

Hereditary Cancer Pre-Screening Questionnaire 
Part A: Answer the following questions about your family 

1 . To your knowledge, has anyone in your family been diagnosed with a very specific 
5 hereditary colon disease called Familial Adenomatous Polyposis (FAP)? 

2. To your knowledge, have you or any aunt had breast cancer diagnosed before the age 35? 

3. Have you had Inflammatory Bowel Disease, also called Crohn's Disease or Ulcerative 
Colitis, for more than 7 years? 

Part B: Refer to the list of cancers below for your responses only to questions in Part B 
10 Bladder Cancer Lung Cancer Pancreatic Cancer 

Breast Cancer Gastric Cancer Prostate Cancer 

Colon Cancer Malignant Melanoma Renal Cancer 

Endometrial Cancer Ovarian Cancer Thyroid Cancer 

4. Have your mother or father, your sisters or brothers or your children had any of the listed 
15 cancers? 

5 Have there been diagnosed in your mother 's brothers or sisters, or your mother 's parents 

more than one of the cancers in the above list? 
6. Have there been diagnosed in your father 's brothers or sisters, or your father 's parents 

more than one of the cancers in the above list? 
20 Part C: Refer to the list of relatives below for responses only to questions in Part C 

You Your mother 

Your sisters or brothers Your mothers's sisters or brothers (maternal aunts and uncles) 
Your children Your mother's parents (maternal grandparents) 

7 Have there been diagnosed in these relatives 2 or more identical types of cancer? 

25 Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer. 

8. Is there a total of 4 or more of any cancers in the list of relatives above other than 
"simple" skin cancers? 

Part D: Refer to the list of relatives below for respo nses only to questions in Part D. 

You Your father 

30 Your sisters or brothers Your fathers 's sisters or brothers (paternal aunts and uncles) 

Your children Your father's parents (paternal grandparents) 

9. Have there been diagnosed in these relatives 2 or more identical types of cancer? 
Do not count "simple" skin cancer, also called basal cell or squamous cell skin cancer. 

10. Is there a total of 4 or more of any cancers in the list of relatives above other than "simple" 
35 skin cancers? 

© Copyright 1996, OncorMed, Inc. 
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Genomic DNA was isolated from white blood cells of five subjects selected 
from analysis of their answers to the questions above. Dideoxy sequence analysis 
was performed following polymerase chain reaction amplification. 

All exons of the BRCA1 gene were subjected to direct dideoxy sequence 
5 analysis by asymmetric amplification using the polymerase chain reaction (PGR) 
to generate a single stranded product amplified from this DNA sample. 
Shuldiner, et aL, Handbook of Techniques in Endocrine Research, p. 457-486, 
DePablo,F., Scanes, C., eds., Academic Press, Inc., 1993. Fluorescent dye was 
attached for automated sequencing using the Taq Dye Terminator® Kit (Perkin- 
10 Elmer cat# 401628). DNA sequencing was performed in both forward and reverse 
directions on an Applied Biosystems, Inc. (ABI) automated Model 377® 
sequencer. The software used for analysis of the resulting data was Sequence 
Navigator® software purchased through ABI. 

15 1. Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (100 nanograms) extracted from white blood cells of five 
subjects. Each of the five samples was sequenced end to end. Each sample was 
amplified in a final volume of 25 microliters containing 1 microliter (100 
nanograms) genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 

20 500 mM KC1, 1.2 mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each 
nucleotide), 2.5 microliters forward primer, 2.5 microliters reverse primer, and 1 
microliter Taq polymerase (5 units), and 13 microliters of water. 

The primers in Table II, below were used to carry out amplification of the 
various sections of the BRCA1 gene samples. The primers were synthesized on 

25 an DNA/RNA Model 394® Synthesizer. 
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TABLE H 

BRCA1 PRIMERS AND SEQUENCING DATA 

SEQ.ID 

EXCN SEQUENCE NO. MER Mq++ SIZE 

EXON2 2F 5* GAA GTT GTC ATT TTA TAA ACC TTT-3' 7 24 1.6 -275 

2R 5' TGT CTT TTC TTC CCT AGT ATG T-3' 8 2 2 

EXON3 3F 5* TCC TGA CAC AGC AGA CAT TTA-3' 9 21 1.4 -375 



10 3R 5' TTG GAT TTT CGT TCT CAC TTA-3' 10 



25 9R 5' TAG GAA A AT ACC AGC TTC ATA GA-3* 2 0 
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EXON 5 5F 5' CTC TTA AGG GCA GTT GTG AG-3 1 1 1 2 0 1.2 -275 

5R 5' TTC CTA CTG TGG TTG CTT CC 12 20^ 

15 EXON6 6/7F 5' CTT ATT TTA GTG TCC TTA AAA GG-3' 13 2 3 1.6 -250 

6R 5* TTT CAT GGA CAG CAC TTG AGT G-3' 14 2 2 



EXON7 7F 5' CAC AAC AAA GAG CAT ACA TAG GG-3' 15 2 3 1 6 -275 

6/7R 5' TCG GGT TCA CTC TGT AGA AG-3 1 16 20 

EXON8 8F1 5' TTC TCT TCA GGA GGA AAA GCA-3' 17 21 1.2 -270 

8R1 5' GCT GOC TAG CAC AAA TAC AAA-3' 18 21 

EXON9 9F 5' CCA CAG TAG ATG CTC AGT AAATA-3' 19 2 3 1.2 -250 



23 



EXOM10 10F 5 1 TGG TCA GCT TTC TGT AAT CG-3' 21 2 0 1.6 -250 

10R 5* GTA TCT ACC CAC TCT CTT CTT CAG-3' 22 24 



30 EXON 11A11AF 5' CCA CCT CCA AGG TGT ATC A-3* 23 19 12 

11AR 5' TGT TAT GTT GGC TCC TTG CT-3 # 24 20 



45 EXON 1 1 F 1 1 FF 5' GAC AGC GAT ACT TTC CCA GA-3* 33 20 1.2 

11FR 5' TGG AAC AAC CAT GAA TTA GTC-3' 34 2 1 
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EXON11B11EF1 5' CAC TAA AGA CAG AAT GAA TCT A-3; 2 5 21 1.2 -400 

11BR1 5' GAA GAA CCA GAA TAT TCA TCT A-3' 2 6 21 

EXON11C11CF1 5* TGA TGG GGA GTC TGA ATC AA-3' 27 20 1.2 -400 

11CR1 5' TCT GCT TTC TTG ATA AAA TCC T-3' 28 22 



EXON11D11DF1 5' AGC GTC CCC TCA CAA ATA AA-3* 29 20 1.2 -400 

40 11DR1 5' TCA AGC GCA TGA ATA TGC CT-3 1 30 20 

EXON 1 1 E 1 1 EF 5' GTA TAA GCA ATA TGG AAC TCG A-3' 31 22 1 2 388 

11ER 5' TTA AGT TCA CTG GTA TTT GAA CA-3' 32 2 3 
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EXON11G11GF 5' GGA AGT TAG CAC TCT AGG GA-3* 3 5 2 0 1.2 42 3 

11GR 5* GCA GTG ATA TTA ACT GTC TGT A-3* 3 6 2 2 



i M13 tailed 
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EXON 11H11HF 5' 
11HR 5' 



TOG GTC CTT AAA GAA 
TCA GGT GAC ATT GAA 



ACA AAGT-3' 
TCT TCC-3' 



37 
38 



22 
21 



1 .2 



366 



EXON 1 1 1 1*1 IF 5' 
11IR 5' 



CCA CTT TTT CCC ATC 
TCA GGA TGC TTA CAA 



AAG TCA-3 4 
TTA CTT C-3* 



39 
40 



21 
21 



1 .2 



377 



EXON 11J1 1JF 5' 
11JR 5' 



CAA AAT TGA ATG CTA 
TOG GTA ACC CTG AGO 



TGC TTA GA-3' 41 2 3 

CAA AT-3' 42 20 



1 .2 



377 



10 EXON 1 1 K 1 1 KF 5' 
11KR-15' 



GCA AAAGCG TCC AGA 
TAT TTG CAG TCA AGT 



AAG GA-3' 43 2 0 

CTT CCA A-3' 44 22 



1 .2 



396 
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EXON11L11LF-1 5' 
11LR 5' 



EXON 12 12F 
12R 



5' 
5' 



GTA ATA TTG GCA AAG 

TAA AAT GTT3 CTC CCC 

GTC CTG CCA ATG AGA 

TGT CAG CAA ACC TAA 
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GCA TCT-3' 45 2 2 1.2 

AAA AGC A-3' 4 6 22' 

AGA AA-3' 47 20 1.2 -300 

GAA TGT-3' 48 21 



EXON 13 13F 5' AAT GGA AAG CTT CTC AAAGTA-3' 
20 13R 5' ATG TTG GAG CTA GGT CCT TAC-3' 



49 
50 



21 
21 



1.2 



-325 



EXON 14 14F 5' CTA ACC TGA ATT ATC 
14R 5' GTG TAT AAATGC CTG 



ACT ATC A-3* 
TAT GCA-3' 



51 
52 



22 
21 



1.2 -310 



25 EXON 15 15F 5' 
15R 5' 



TOG CTG CCC AGG AAG 
AAC CAG AAT ATC TTT 



TAT G-3 ( 53 19 

ATG TAG GA-3 1 54 2 3 



1 .2 



■375 
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EXON 16 16F 5' 

16R 5\ 

EXON 17 17F 5' 

17R 5' 



AAT TCT TAA CAG AGA 

AAA ACT CTT TCC AGA 

GTT3 TAG AAC GTG CAG 

TCG CCT CAT GTG GTT 



CCA GAA C-3' 
ATG TTG T-3' 

GAT TG-3' 
TTA-3' 



55 
56 

57 
58 



22 
22 

20 
1 8 



1 .6 



1.2 



-550 



-275 



EXON 18 18F 5' 
35 18R 5' 



GGC TCT TTA GOT TCT TAG GAC-3 1 
GAG ACC ATT TTC CCA GCA TC-3' 



59 
60 



21 
20 



1 .2 



-350 



EXON 19 19F 5* 
19R 5* 



CTG TCA TTC TTC CTG TGC TC-3' 
CAT TGT TAA GGA AAG TOG TGC-3' 



61 
62 



20 
21 



1 .2 



-250 



40 EXON 20 20 F 5' 
20R 5* 



ATA TGA CGT GTC TCC TCC AC-3' 
GGG AAT CCA AAT TAC ACA GC-3' 



63 
64 



20 
20 



1 .2 



-425 



45 



EXON 21 21 F 5' 

21R 5' 

EXON 22 22 F 5' 

22R 5' 



AAG CTC TTC CTT TTT 

GTA GAG AAA TAG AAT 

TCC CAT TGA GAG GTC 

GAG AAG ACT TCT GAG 



GAA AGT C-3* 

AGC CTC T-3 1 

TTG CT-3' 

GCT AC-3' 



65 
66 

67 
68 



22 
22 

20 
20 



1.6 



1 .6 



-300 



-300 



EXON 23 23F-1 5' 
50 23R-1 5' 



TGA AGT GAC AGT TCC 
CAT TTT AGC CAT TCA 



AGT AGT-3' 6 9 21 

TTC AAC AA-3* 70 23 



1.2 -250 



EXON 24 24 F 5' 
24R 5' 



ATG AAT TGA CAC TAA 
GTA GCC AGG ACA GTA 



TCT CTG C-3' 
GAA GGA-3* 



71 
72 



22 
21 



1 .4 



-285 
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Thirty-five cycles were performed, each consisting of denaturing (95°C; 30 
seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time was increased to 5 minutes, and during the 
last cycle in which the extension time was increased to 5 minutes. 
5 PCR products were purified using Qia-quick® PCR purification kits (Qiagen cat# 

28104; Chats worth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye was attached to PCR products for automated sequencing using the Taq 
10 Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing was performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data was "Sequence Navigator® software" purchased through ABL 

3. RESULTS 

15 Differences in the nucleic acids of the ten alleles from five individuals were found in 
seven locations on the gene. The changes and their positions are found on TABLE m, 
below. 
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5 AMINO Nucleotide 
ACID CHANGE Change 1 



TABLE III 
PANEL TYPING 



FREQUENCY 



10 



SER(SER) 
(694) 

LEU(LEU) 
(771) 



11E 



11F 



C/C C/T C/T T/T T/T 



T/T C/T C/T C/C C/C 



0.4 C 
0.6 T 

0.4 T 
0.6 C 



PRO(LEU) 
15 (871) 



11G 



C/T C/T C/T T/T T/T 



0.3 C 
0.7 T 



GLU(GLY) 
(1038) 



111 



A/ A A/G A/G G/G G/G 



0:4 A 
0.6 G 



20 



LYS(ARG) 
(1183) 



1 1 J 



A/ A A/G A/G G/G G/G 



0.4 A 
0.6 G 



25 



SER(SER) 
(1436) 

SER(GLY) 
(1613) 



1 3 



1 6 



T/T T/T T/C C/C C/C 



A/ A A/G A/G G/G G/G 



0.5 T 
0.5 C 

0.4 A 
0.6 G 
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10 



15 



Tables 3 and 4 depict one aspect of the invention, sets of at least two alternative 
codon pairs wherein the codon pairs occur" in the following frequencies, 
respectively, in a population of individuals free of disease: 

at position 2201, AGC and AGT occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 2430, TTG and CTG occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 2731, CCG and CTG occur at frequencies from about 25- 
35%, and from about 65-75%, respectively; 

at position 3232, GAA and GGA occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 3667, AAA and AGA occur at frequencies from about 35- 
45%, and from about 55-65%, respectively; 

at position 4427, TCT and TCC occur at frequencies from about 45- 
55%, and from about 45-55%, respectively; and 

at position 4956, AGT and GGT occur at frequencies from about 35- 
45%, and from about 55-65%, respectively. 



The data show that for each of the samples. The BRCA1 gene is identical except 
20 in the region of seven polymorphisms. These polymorphic regions, together with their 
locations, the amino acid groups of each codon, the frequency of their occurrence and 
the amino acid coded for by each codon are found in TABLE IV below. 

TABLE IV 

25 CODON AND BASE CHANGES IN SEVEN POLYMORPHIC SITES OF BRCA1 GENE 

SAMPLE BASE POSITION CCDCN AA PUBLISHED FREQUENCY 

NAME CHANGE nt/aa EXDN CHANGE CHANGE FREQUENCY 2 IN THIS STUDY 



30 



2,3,4,5 



2,3,4,5 



C-T 



T-C 



2201/694 11E 



2430/771 11F 



AGC(AGT) SER-SER UNPUBUSHED 



TTG(CTG) LEU-LEU T=67% 13 



C=40% 



T=40% 



1,2.3,4,5 C-T 



2731/871 11G CCG(CTG) PRO-LEU C=34% 12 



C=30% 



30 



2,3,4,5 A-G 3232/1038 1 11 GAA(GGA) GLU-GLY A=67% 13 A = 40% 

2,3,4,5 A-G 3667/1 183 1 1J AAA(AGA) LYS-ARG A=68% 12 A=40% 

5 3,4,5 T-C 4427/1436 13 TCT(TCC) SER-SER T=67% 12 T=50% 

2,3,4,5 A-G 4956/1613 16 AGT(GGT) SER-GLY A=67% 12 A=40% 



2 Reference numbers correspond to the Table of References below, 

10 
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EXAMPLE 2 

Determination Of A Individual Using BRCAli QMl l And The Seven Polymorphisms 
For Reference 

A person skilled in the art of genetic susceptibility testing will find the present 
5 invention useful for: 

a) identifying individuals having a BRCA1 gene, who are therefore have no 
elevated genetic susceptibility to breast or ovarian cancer from a BRCA1 
mutation; 

b) avoiding misinterpretation of polymorphisms found in the 
10 BRCA1 gene; 

Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in 
question. However, a BRCAl(° mi ) sequence is used for reference and the polymorphic 
3 sites are compared to the nucleic acid sequences listed above for codons at each 
j polymorphic site. A sample is one which compares to a BRCAl(° mi ) sequence and 
;i5 contains one of the base variations which occur at each of the polymorphic sites. The 
J codons which occur at each of the polymorphic sites are paired here reference. 

• AGC and AGT at position 2201, 
^ <• TTG and CTG at position 2430, 

* • CCG and CTG at position 2731, 
320 • GAA and GGA at position 3232, 

* • AAA and AGA at position 3667, 

• TCT and TCC at position 4427, and 

• AGT and GGT at position 4956. 

The availability of these polymorphic pairs provides added assurance that one skilled in 
25 the art can correctly interpret the polymorphic variations without mistaking a 
variation for a mutation. 

Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 
asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et al., Handbook 
30 of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, C., eds v Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
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both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 
Navigator® software" purchased through ABI. 

5 1. Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (1.00 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 
mM MgCl 2 )/ 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
10 forward primer (BRCA1-11K-F, 10 micro molar solution), 2.5 microliters reverse primer 
(BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

The PCR primers used to amplify a patient's sample BRCA1 gene are listed in Table II. 
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five 
15 cycles are of amplification are performed, each consisting of denaturing (95°C; 30 
seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time is increased to 5 minutes, and during the 
last cycle in which the extension time is increased to 5 minutes. 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; 
20 Chatsworth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
25 Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABI.. The BRCAl( omn ) SEQ. 
ID. NO.:l sequence is entered into the Sequence Navigator® software as the Standard for 
30 comparison. The Sequence Navigator® software compares the sample sequence to the 
BRCAl(° mil ) SEQ. ID. NO.:l standard, base by base. The Sequence Navigator® software 
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highlights all differences between the BRCAl(° mil ) SEQ. ID. NO.:l DNA sequence and 
the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAK omil ) SEQ. ID. NO.:l standard against the patient's sample, and again highlights 
5 any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and a second primary technologist. A secondary technologist 
10 then reviews the chromatograms. The results are finally interpreted by a geneticist. In 
each instance, a variation is compared to known polymorphisms for position and base 
change. If the sample BRCA1 sequence matches the BRCAl< omil > SEQ. ID. NO.:l 
standard, with only variations within the known list of polymorphisms, it is 
interpreted as a gene sequence. 

=15 

J EXAMPLE 3 

DETERMINING THE ABSENCE OF A MUTATION IN THE BRCA1 GENE USING 
t BRCAli asim AND SEVEN POLYMORPHISMS FOR REFERENCE 

~ . A person skilled in the art of genetic susceptibility testing will find the present 

520 invention useful for determining the presence of a known or previously unknown 

mutation in the BRCA1 gene. A list of mutations of BRCA1 is publicly available in the 

Breast Cancer Information Core at: 

http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became publicly 
available on November 1, 1995. Friend, S. et al Nature Genetics 11:238, (1995). 
25 Sequencing is carried out as in EXAMPLE 1 using a blood sample from the patient in 
question. However, a BRCAl(° mi ) sequence is used for reference and polymorphic sites 
are compared to the nucleic acid sequences listed above for codons at each polymorphic 
site. A sample is one which compares to the BRCAl< omi2 ) SEQ. ID. NO.: 3 sequence and 
contains one of the base variations which occur at each of the polymorphic sites. The 
30 codons which occur at each of the polymorphic sites are paired here reference. 
AGC and AGT at position 2201, 
TTG and CTG at position 2430, 
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• CCG and CTG at position 2731, 

• GAA and GGA at position 3232, 

• AAA and AGA at position 3667, 

• TCT and TCC at position 4427, and 
5 • AGT and GGT at position 4956. 

The availability of these polymorphic pairs provides added assurance that one skilled in 
the art can correctly interpret the polymorphic variations without mistaking a 
variation for a mutation. 

Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 

10 asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et aL, Handbook 
of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, C., eds., Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 

15 both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 
Navigator® software" purchased through ABI. 

1. Polymerase Chain Reaction (PCR) Amplification 

20 Genomic DNA (100 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 
mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer 

25 (BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

The PCR primers used to amplify a patient's sample BRCA1 gene are listed in Table II. 
The primers were synthesized on an DNA/RNA Model 394® Synthesizer. Thirty-five 
cycles are of amplification are performed, each consisting of denaturing (95°C; 30 
30 seconds), annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during 
the first cycle in which the denaturing time is increased to 5 minutes, and during the 
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last cycle in which the extension time is increased to 5 minutes. 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 
28104; Chatsworth, CA). Yield and purity of the PCR product determined 
spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
CA., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABI. The BRCAK omi2 ) SEQ. 
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard 
for comparison. The Sequence Navigator® software compares the sample sequence to 
the BRCAl(°mi2) SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator® 
software highlights all differences between the BRCAK omi2 ) SEQ. ID. NO.: 3 DNA 
sequence and the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAK omi2 ) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights 
any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and also by a second primary technologist. A secondary 
technologist then reviews the chromatograms. The results are finally interpreted by a 
geneticist. In each instance, a variation is compared to known polymorphisms for 
position and base change. If the sample BRCA1 sequence matches the BRC Al ( omi2 ) SEQ. 
ID. NO.: 3 standard, with only variations within the known list of polymorphisms, it is 
interpreted as a gene sequence. 
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EXAMPLE 4 

DETERMINING THE PRESENCE OF A MUTATION IN THE BRCA1 GENE USING 



BRCAli omil AND SEVEN POLYMORPHISMS FOR REFERENCE 

A person skilled in the art of genetic susceptibility testing will find the present 
5 invention useful for determining the presence of a known or previously unknown 
mutation in the BRCA1 gene. A list of mutations of BRCA1 is publicly available in the 
Breast Cancer Information Core at: 

http://www.nchgr.nih.gov/dir/lab_transfer/bic. This data site became publicly 
available on November 1, 1995. Friend, S. et al Nature Genetics 11:238, (1995). In this 

10 example, a mutation in exon 11 is characterized by amplifying the region of the 
mutation with a primer which matches the region of the mutation. 
Exon 11 of the BRCA1 gene is subjected to direct dideoxy sequence analysis by 
asymmetric amplification using the polymerase chain reaction (PCR) to generate a 
single stranded product amplified from this DNA sample. Shuldiner, et al., Handbook 

15 of Techniques in Endocrine Research, p. 457-486, DePablo,F., Scanes, C, eds., Academic 
Press, Inc., 1993. Fluorescent dye is attached for automated sequencing using the Taq 
Dye Terminator® Kit (Per kin-Elmer cat# 401628). DNA sequencing is performed in 
both forward and reverse directions on an Applied Biosystems, Inc. (ABI) automated 
Model 377® sequencer. The software used for analysis of the resulting data is "Sequence 

20 Navigator® software" purchased through ABI. 

1. Polymerase Chain Reaction (PCR) Amplification 

Genomic DNA (100 nanograms) extracted from white blood cells of the subject is 
amplified in a final volume of 25 microliters containing 1 microliter (100 nanograms) 
25 genomic DNA, 2.5 microliters 10X PCR buffer (100 mM Tris, pH 8.3, 500 mM KC1, 1.2 
mM MgCl 2 ), 2.5 microliters 10X dNTP mix (2 mM each nucleotide), 2.5 microliters 
forward primer (BRCA1-11K-F, 10 micromolar solution), 2.5 microliters reverse primer 
(BRCA1-11K-R, 10 micromolar solution),and 1 microliter Taq polymerase (5 units), and 
13 microliters of water. 

30 The PCR primers used to amplify segment K of exon 11 (where the mutation is found) 
are as follows: 
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BRCA1-11K-F: 5'-GCA AAA GCG TCC AGA AAG GA-3' SEQ ID NO:69 
BRCA1-11K-R: 5'-AGT CTT CCA ATT CAC TGC AC-3' SEQ ID NO:70 
The primers are synthesized on an DNA/RNA Model 394® Synthesizer. 
Thirty-five cycles are performed, each consisting of denaturing (95°C; 30 seconds), 
5 annealing (55°C; 1 minute), and extension (72°C; 90 seconds), except during the first 
cycle in which the denaturing time is increased to 5 minutes, and during the last cycle 
in which the extension time is increased to 5 minutes. 

PCR products are purified using Qia-quick® PCR purification kits (Qiagen, cat# 28104; 
Chatsworth, CA). Yield and purity of the PCR product determined 
10 spectrophotometrically at OD 260 on a Beckman DU 650 spectrophotometer. 

2. Dideoxy Sequence Analysis 

Fluorescent dye is attached to PCR products for automated sequencing using the Taq 
Dye Terminator® Kit (Perkin-Elmer cat# 401628). DNA sequencing is performed in 

15 both forward and reverse directions on an Applied Biosystems, Inc. (ABI) Foster City, 
C A., automated Model 377® sequencer. The software used for analysis of the resulting 
data is "Sequence Navigator® software" purchased through ABI. The BRCAK omi2 ) SEQ. 
ID. NO.: 3 sequence is entered into the Sequence Navigator® software as the Standard 
for comparison. The Sequence Navigator® software compares the sample sequence to 

20 the BRCAKomil) SEQ. ID. NO.: 3 standard, base by base. The Sequence Navigator® 
software highlights all differences between the BRCAK omi2 ) SEQ. ID. NO.: 3 DNA 
sequence and the patient's sample sequence. 

A first technologist checks the computerized results by comparing visually the 
BRCAK°nu2) SEQ. ID. NO.: 3 standard against the patient's sample, and again highlights 

25 any differences between the standard and the sample. The first primary technologist 
then interprets the sequence variations at each position along the sequence. 
Chromatograms from each sequence variation are generated by the Sequence 
Navigator® software and printed on a color printer. The peaks are interpreted by the 
first primary technologist and a second primary technologist. A secondary technologist 

30 then reviews the chromatograms. The results are finally interpreted by a geneticist. In 
each instance, a variation is compared to known polymorphisms for position and base 
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change. Mutations are noted by the length of non-matching variation. Such a lengthy 
mismatch pattern occurs with deletions and substitutions. 

3. Result 

5 Using the above PCR amplification and standard fluorescent sequencing 

technology, The 3888delGA mutation may be found. The 3888delGA mutation The 
BRCA1 gene lies in segment "K" of exon 11. The DNA sequence results demonstrate 
the presence of a two base pair deletion at nucleotides 3888 and 3889 of the published 
BRCAl(° mi ) sequence. This mutation interrupts the reading frame of the BRCA1 
10 transcript, resulting in the appearance of an in-frame terminator (TAG) at codon 
position 1265. This mutation is, therefore, predicted to result in a truncated, and most 
likely, no n- functional protein. The formal name of the mutation will be 3888delGA. 
This mutation is named in accordance with the suggested nomenclature for naming 
mutations, Baudet, A et al, Human Mutation 2:245-248, (1993). 

15 

EXAMPLE 5 

USE OF THE BRCAli gsiil l GENE THERAPY 

The growth of ovarian, breast or prostate cancer can be arrested by increasing the 
20 expression of the BRCA1 gene where inadequate expression of that gene is responsible 
for hereditary ovarian, breast and prostate cancer. It has been demonstrated that 
transfection of BRCA1 into cancer cells inhibits their growth and reduces 
tumorigenesis. Gene therapy is performed on a patient to reduce the size of a tumor. 
The LXSN vector is transformed with any of the BRCAl(° mil ) SEQ. ID. NO.:l, 
25 BRCAl( omi2 ) SEQ. ID. NO.:3, or BRCAl(° mi3 > SEQ. ID. NO.:5 coding region. 

Vector 

The LXSN vector is transformed with wildtype BRCAK omil ) SEQ. ID. NO.:l coding 
sequence. The LXSN-BRCAl(° mil ) retroviral expression vector is constructed by cloning 
30 a Sa/Minkered BRCAK omil ) cDNA (nucleotides 1-5711) into the Xhol site of the vector 
LXSN. Constructs are confirmed by DNA sequencing. Holt et al. Nature Genetics 12: 
298-302 (1996). 
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Retroviral vectors are manufactured from viral producer cells using serum free and 
phenol-red free conditions and tested for sterility, absence of specific pathogens, and 
absence of replication-competent retrovirus by standard assays. Retrovirus is stored 
frozen in aliquots which have been tested. 
5 Patients receive a complete physical exam, blood, and urine tests to determine 

overall health. They may also have a chest X-ray, electrocardiogram, and appropriate 
radiologic procedures to assess tumor stage. 

Patients with metastatic ovarian cancer are treated with retroviral gene therapy by 
infusion of recombinant LX3N-BRCAl(° mil > retroviral vectors into peritoneal sites 
10 containing tumor, between 10 9 and 10 10 viral particles per dose. Blood samples are 
drawn each day and tested for the presence of retroviral vector by sensitive polymerase 
chain reaction (PCR)-based assays. The fluid which is removed is analyzed to 
determine: 

15 1. The percentage of cancer cells which are taking up the recombinant LXSN- 
BRCAl(° mil ) retroviral vector combination. Successful transfer of BRCA1 gene into 
cancer cells is shown by both RT-PCR analysis and in situ hybridization. 
RT-PCR is performed with by the method of Thompson et al. Nature Genetics 9: 444- 
450 (1995), using primers derived from BRCAl(° mil > SEQ. ID. NO.rl. Cell lysates are 

20 prepared and immunob lotting is performed by the method of Jensen et al. Nature 
Genetics 12: 303-308 1996) and Jensen et al Biochemistry 31: 10887-10892 (1992). 

2. Presence of programmed cell death using ApoTAG® in situ apoptosis detection kit 
(Oncor, Inc., Gaithersburg, Maryland) and DNA analysis. 

25 

3. Measurement of BRCA I gene expression by slide immunofluorescence or western 
blot. 

Patients with measurable disease are also evaluated for a clinical response to LXSN- 
BRCAI, especially those that do not undergo a palliative intervention immediately after 
30 retroviral vector therapy. Fluid cytology, abdominal girth, CT scans of the abdomen, 
and local symptoms are followed. 
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For other sites of disease, conventional response criteria are used as follows: 

1. Complete Response (CR), complete disappearance of all measurable lesions 
and of all signs and symptoms of disease for at least 4 weeks. 

2. Partial Response (PR), decrease of at least 50% of the sum of the products of the 
5 2 largest perpendicular diameters of all measurable lesions as determined by 2 

observations not less than 4 weeks apart. To be considered a PR, no new lesions should 
have appeared during this period and none should have increased in size. 

3. Stable Disease, less than 25% change in tumor volume from previous 
evaluations. 

10 4. Progressive Disease, greater than 25% increase in tumor measurements from 
prior evaluations. 

The number of doses depends upon the response to treatment. 
For further information related to this gene therpay approach see in "BRCA1 
Retroviral Gene Therapy for Ovarian Cancer" a Human Gene Transfer Protocol: NIH 
15 ORDA Registration #: 9603-149 Jeffrey Holt, JT, M.D. and Carlos L. Arteaga, M.D. 
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"Breast and Ovarian cancer" is understood by those skilled in the art to 
include breast and ovarian cancer in women and also breast and prostate cancer in men. 
BRCA1 is associated genetic susceptibility to inherited breast and ovarian cancer in 
women and also breast and prostate cancer in men. Therefore, claims in this document 
which recite breast and/or ovarian cancer refer to breast, ovarian and prostate cancers in 
men and women. Although the invention has been described with reference to the 
presently preferred embodiments, it should be understood that various modifications 
can be made without departing from the spirit of the invention. Accordingly, the 
invention is limited only by the following claims. 
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SEQUENCE LISTING 



( 1 ) GENERAL INFORMATION : 

(i) APPLICANT: Murphy, Patricia D. 

Allen, Antonette C. 
Alvares, Christopher P. 
Critz, Brenda S. 
Olson, Sheri J. 
Schelter, Denise B. 
Zeng, Bin 



(ii) TITLE OF INVENTION: A Sequence of the Human BRCA1 Gene 
(iii) NUMBER OF SEQUENCES: 78 

( iv) CORRESPONDENCE ADDRESS : 

(A) ADDRESSEE: ONCORMED 

(B) STREET: 200 PERRY PARKWAY 

(C) CITY: GAITHERSBURG 

(D) STATE: MD 

(E) COUNTRY: USA 

(F) ZIP: 20877 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 
.(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: to be assigned 

(B) FILING DATE: herewith 

( C ) CLASSIFICATION : 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: R. THOMAS GALLEGOS 

(B) REGISTRATION NUMBER: 3 2,692 
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(C) REFERENCE/ DOCKET NUMBER: PA- 0054 

(ix) TELECOMMUNICATION INFORMATION t- 
(A) TELEPHONE: 301-527-2051 
<B> TELEFAX: 301-208-6997 



(2) INFORMATION FOR SEQ ID NO : 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT : 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 
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GTCCTTTATG 
AACTTGTTGA 
ATGCAAACAG 
AAGTTTCTAT 
AACCCGAAAA 
CTGTGAGAAC 
AATTGGG AT C 
ATCAAGAATT 
^fAAAAAAGGC 
I C C AGTAATAA 
^TCAGGGTAG 
klCTCATTACA 

u^ggctgaatt 
hSggctggaag 
atctgaatgc 

CAGAGAATCC 
AAGTTAATGA 
GGGAGTCTGA 
AATATTCTGG 
TATGTAAAAG 



TAAGAATGAT 
AGAGCTATTG 
CTATAATTTT 
CATCCAAAGT 
TCCTTCCTTG 
TCTGAGGACA 
TGATTCTTCT 
GTTACAAATC 
TGCTTGTGAA 
TGATTTGAAC 
TTCTGTTTCA 
GCATGAGAAC 
CTGTAATAAA 
TAAGGAAACA 
TGATCCCCTG 
TAGAGATACT 
GTGGTTTTCC 
ATCAAATGCC 
TTCTTCAGAG 
TGAAAGAGTT 



ATAACCAAAA 
AAAATCATTT 
GCAAAAAAGG 
ATGGGCTACA 
CAGGAAACCA 
AAGCAGCGGA 
GAAGATACCG 
ACCCCTCAAG 
TTTTCTGAGA 
ACCACTGAGA 
AACTTGCATG 
AGCAGTTTAT 
AGCAAACAGC 
TGTAATGATA 
TGTGAGAGAA 
GAAGATGTTC 
AGAAGTGATG 
AAAGTAGCTG 
AAAAT AG AC T 
CACTCCAAAT 



GGAGCCTACA 
GTGCTTTTCA 
AAAAT AAC TC 
GAAACCGTGC 
GTCTCAGTGT 
TACAACCTCA 
TTAATAAGGC 
GAACCAGGGA 
CGGATGTAAC 
AGCGTGCAGC 
TGGAGCCATG 
TACTCACTAA 
CTGGCTTAGC 
GGCGGACTCC 
AAGAATGGAA 
CTTGGATAAC 
AACTGTTAGG 
ATGTATTGGA 
TACTGGCCAG 
CAGTAGAGAG 



AG AAAGT AC G 
GC TTGAC AC A 
TCCTGAACAT 
CAAAAGACTT 
CCAACTCTCT 
AAAGACGTCT 
AACTTATTGC 
TGAAATCAGT 
AAATACTGAA 
TGAGAGGCAT 
TGGCACAAAT 
AGACAGAATG 
AAGGAGCCAA 
CAGCACAGAA 
TAAGCAGAAA 
AC TAAATAGC 
TTCTGATGAC 
CGTTCTAAAT 
TGATCCTCAT 
TAATATTGAA 



AGATTTAGTC 
GGTTTGGAGT 
CTAAAAGATG 
CTACAGAGTG 
AACCTTGGAA 
GTCTACATTG 
AGTGTGGGAG 
TTGGATTCTG 
CATCATCAAC 
CCAGAAAAGT 
ACTCATGCCA 
AATGTAGAAA 
CATAACAGAT 
AAAAAGGTAG 
CTGCCATGCT 
AGCATTCAGA 
TCACATGATG 
GAGGTAGATG 
GAGGCTTTAA 
G AC AAAAT AT 



360 
420 
480 
54 0 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
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TTGGGAAAAC 


CTATCGGAAG 


AAGGCAAGCC 


TCCCCAACTT 


AAGCCATGTA 


ACTGAAAATC 


1560 


TAATTATAGG 


AGCATTTGTT 


ACTGAGCCAC 


AGATAATACA 


AGAGCGTCCC 


CTCACAAATA 


1620 


AATTAAAGCG 


TAAAAGGAGA 


CCTACATCAG 


GCCTTCATCC 


TGAGGATTTT 


ATCAAGAAAG 


1680 


CAGATTTGGC 


AGTTCAAAAG 


ACTCCTGAAA 


TGATAAATCA 


GGGAACTAAC 


CAAACGGAGC 


1740 


AGAATGGTCA 


AGTGATGAAT 


ATTACTAATA 


GTGGTCATGA 


GAATAAAACA 


AAAGGTGATT 


1800 


CTATTCAGAA 


TGAGAAAAAT 


CCTAACCCAA 


TAGAATCACT 


CGAAAAAGAA 


TCTGCTTTCA 


1860 


AAACGAAAGC 


TGAACCTATA 


AGCAGCAGTA 


TAAGCAATAT 


GGAACTCGAA 


TTAAATATC C 


1920 


ACAATTCAAA 


AGCACCTAAA 


AAGAATAGGC 


TGAGGAGGAA 


GTCTTCTACC 


AGGCATATTC 


1980 


QATGCGCTTGA 


AC T AGTAGTC 


AGTAGAAATC 


TAAGCCCACC 


T AATTGT AC T 


GAATTGCAAA 


2040 


~ -TTGAT AGTTG 


TTCTAGCAGT 


GAAGAGATAA 


AGAAAAAAAA 


GTACAACCAA 


ATGCCAGTCA 


2100 


^JgGCACAGCAG 


AAACCTACAA 


CTCATGGAAG 


GTAAAGAACC 


TGCAACTGGA 


GCCAAGAAGA 


2160 


i^GTAACAAGCC 


AAATGAACAG 


ACAAGTAAAA 


GACATGACAG 


TGATACTTTC 


CCAGAGCTGA 


2220 


f^AGTTAACAAA 


TGCACCTGGT 


TCTTTTACTA 


AGTGTTCAAA 


TACCAGTGAA 


CTTAAAGAAT 


2280 


1 3?TGTC AATCC 


TAGCCTTCCA 


AGAGAAGAAA 


AAGAAGAGAA 


ACTAGAAACA 


GTTAAAGTGT 


2340 


CTAATAATGC 


TGAAGACCCC 


AAAGATCTCA 


TGTTAAGTGG 


AGAAAGGGTT 


TTGCAAACTG 


2400 


AAAGATCTGT 


AGAGAGTAGC 


AGTATTTCAC 


TGGTACCTGG 


TACTGATTAT 


GGCACTCAGG 


2460 


AAAGTATCTC 


GTTACTGGAA 


GTTAGCACTC 


TAGGGAAGGC 


AAAAACAGAA 


CCAAATAAAT 


2520 


GTGTGAGTCA 


GTGTGCAGCA 


TTTGAAAACC 


CCAAGGGACT 


AATTCATGGT 


TGTTCCAAAG 


2580 


ATAATAGAAA 


TGACACAGAA 


GGCTTTAAGT 


ATCCATTGGG 


ACATGAAGTT 


AACCACAGTC 


2640 


GGGAAACAAG 


CATAGAAATG 


GAAGAAAGTG 


AACTTGATGC 


TCAGTATTTG 


CAGAATACAT 


2700 
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TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 27 60 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 282 0 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 28 8 0 

AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 2 94 0 

ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 30 00 

ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3 0 60 

CACCACTTTT T C C CAT CAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 312 0 

AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 3180 

WGTACAGTGAG CACAATTAGC . CGTAATAACA TTAGAGAAAA TGTTTTTAAA GGAGCCAGCT 3240 

^ CAAGCAATAT TAATGAAGTA GGTTGCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3 3 00 

TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 3 360 

^ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3 42 0 

[ jGTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 3 430 

C 3AT AC AG ATT T CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3 54 0 

ATGCATCTCA GGTTTGTTCT GAG AC AC C TG ATGACCTGTT AGATGATGGT GAAATAAAGG 3 600 

AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGC AAAAGC G 3 660 

TCCAGAGAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 3 72 0 

GTTACCGAAG AGGGGCCAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 3 780 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3 840 

CTACTAGGCA TAGC AC CGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3 90 0 
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TATCATTGAA GAATAGCTTA AATGACTGCA 

AGGAACATCA CCTTAGTGAG GAAACAAAAT 

GTGAATTGGA AGACTTGACT GCAAATACAA 

CCAAACAAAT GAGGCATCAG TCTGAAAGCC 

TTTCAGATGA TGAAGAAAGA GGAACGGGCT 

TGGATTCAAA CTTAGGTGAA GCAGCATCTG 

ACTGCTCAGG GCTATCCTCT CAGAGTGACA 

AACATAACCT GATAAAGCTC CAGCAGGAAA 

3ATGGGAGCCA GCCTTCTAAC AGCTACCCTT 

SACCTGCGAAA TCCAGAACAA AGCACATCAG 

;-feTGAATACCC TATAAGCCAG AATCCAGAAG 

:: £AGATAGTTC TACCAGTAAA AATAAAGAAC 

flSCCCATCATT AGATGATAGG TGGTACATGC 

| ACT ACC CATC TCAAGAGGAG CTCATTAAGG 

AGTCTGGGCC ACACGATTTG ACGGAAACAT 

CCCCTTACCT GGAATCTGGA ATCAGCCTCT 

AAGACAGAGC CCCAGAGTCA GCTCGTGTTG 

AAGTTCCCCA ATTGAAAGTT GCAGAATCTG 

ATACTGCTGG GTATAATGCA ATGGAAGAAA 

CTTCAACAGA AAGGGTCAAC AAAAGAATGT 



GTAACCAGGT AATATTGGCA AAGGCATCTC 
GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 
AC AC C C AGGA TCCTTTCTTG ATTGGTTCTT 
AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 
TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 
GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 
TTTTAACCAC TCAGCAGAGG GAT AC CATGC 
TGGCTGAACT AGAAGCTGTG TTAGAACAGC 
CCATCATAAG TGACTCCTCT GCCCTTGAGG 
AAAAAGCAGT ATTAACTTCA CAGAAAAGTA 
GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 
CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 
ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 
TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 
CTTACTTGCC AAGGCAAGAT CTAGAGGGAA 
TCTCTGATGA CCCTGAATCT GATCCTTCTG 
GCAACATACC ATCTTCAACC TCTGCATTGA 
CCCAGGGTCC AGCTGCTGCT CATACTACTG 
GTGTGAGCAG GGAGAAGCCA GAATTGACAG 
CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 
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AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 

CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 

TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 

AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 

TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 

TCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 

AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 

GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 

t-CCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 

Itgtagcact^ctaccagtgc CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 
3 j$C CACTACTG A 
42) INFORMATION FOR SEQ ID NO : 2 : 

j (i) SEQUENCE CHARACTERISTICS: 

!3 (A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME : 

(A) CHROMOSOME/ SEGMENT: 17 

(B) MAP POSITION: 17q21 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



Met Asp Leu Ser 
1 

Ala Met Gin Lys 
20 

Glu Pro Val Ser 
35 

Leu Lys Leu Leu 
50 

Lys Asn Asp lie 
65 

Gin Leu Val Glu 



Thr Gly Leu Glu 
100 

Asn Ser Pro Glu 
115 

Gly Tyr Arg Asn 
130 

Pro Ser Leu Gin 
145 



Ala Leu Arg Val 
5 

lie Leu Glu Cys 



Thr Lys Cys Asp 
40 

Asn Gin Lys Lys 
55 

Thr Lys Arg Ser 
70 

Glu Leu Leu Lys 
85 

Tyr Ala Asn Ser 



His Leu Lys Asp 
120 

Arg Ala Lys Arg 
135 

Glu Thr Ser Leu 
150 



Glu Glu Val Gin 
10 

Pro lie Cys Leu 
25 

His lie Phe Cys 



Gly Pro Ser Gin 
60 

Leu Gin Glu Ser 
75 

lie lie Cys Ala 
90 

Tyr Asn Phe Ala 
105 

Glu Val Ser lie 



Leu Leu Gin Ser 
140 

Ser Val Gin Leu 
155 



Asn Val lie Asn 
15 

Glu Leu lie Lys 
30 

Lys Phe Cys Met 
45 

Cys Pro Leu Cys 



Thr Arg Phe. Ser 
80 

Phe Gin Leu Asp 
95 

Lys Lys Glu Asn 
110 

lie Gin Ser Met 
125 

Glu Pro Glu Asn 



Ser Asn Leu Gly 
160 



Thr Val Arg Thr Leu Arg Thr Lys Gin Arg lie Gin Pro Gin Lys Thr 

165 170 175 * 

Ser Val Tyr lie Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 

180 185 ' 190 
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Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin lie Thr 
195 200 ' 205 



Pro Gin Gly Thr Arg 
210 

Ala Cys Glu Phe Ser 
225 

Pro Ser Asn Asn Asp 

245 

His Pro Glu Lys Tyr 
2 60 

Pro Cys Gly Thr Asn 
275 

Ser Leu Leu Leu Thr 
290 

Cys Asn Lys Ser Lys 
305 

Trp Ala Gly Ser Lys 

325 

Glu Lys Lys Val Asp 
340 

Trp Asn Lys Gin Lys 
355 

Asp Val Pro Trp lie 
370 

Trp Phe Ser Arg Ser 
385 

Gly Glu Ser Glu Ser 



Asp Glu lie Ser Leu Asp 
215 

Glu Thr Asp Val Thr Asn 

230 235 

Leu Asn Thr Thr Glu Lys 

250 

Gin Gly Ser Ser Val Ser 
265 

Thr His Ala Ser Ser Leu 
280 

Lys Asp Arg Met Asn Val 
295 

Gin Pro Gly Leu Ala Arg 
310 315 

Glu Thr Cys Asn Asp Arg 

330 

Leu Asn Ala Asp_ Pro Leu 
345 

Leu Pro Cys Ser Glu Asn 
360 

Thr Leu Asn Ser Ser lie 
375 

Asp Glu Leu Leu Gly Ser 
390 395 

Asn Ala Lys Val Ala Asp 



Ser Ala Lys Lys Ala 
220 

Thr Glu His His Gin 

240 

Arg Ala Ala Glu Arg 
255 

Asn Leu His Val Glu 
270 

Gin His Glu Asn Ser 
285 

Glu Lys Ala Glu Phe 
300 

Ser Gin His Asn Arg 

320 

Arg Thr Pro Ser Thr 
335 

Cys Glu Arg Lys Glu 
350 

Pro Arg Asp Thr Glu 
365 

Gin Lys Val Asn Glu 
380 

Asp Asp Ser His Asp 

400 

Val Leu Asp Val Leu 
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405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys lie Asp Leu Leu 
420 425 430 

Ala Ser Asp Pro His Glu Ala Leu lie Cys Lys Ser Glu Arg Val His 
435 440 445 

Ser Lys Ser Val Glu Ser Asn lie Glu Asp Lys lie Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 
465 470 475 480 

Leu lie lie Gly Ala Phe Val Thr Glu Pro Gin lie lie Gin Glu Arg 

485 490 495 

Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

His Pro Glu Asp Phe lie Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

Pro Glu Met lie Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
530 535 540 

Val Met Asn lie Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 560 

Ser lie Gin Asn Glu Lys Asn Pro Asn Pro lie Glu Ser Leu Glu Lys 

565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro lie Ser Ser Ser lie Ser 

580 585 590 

Asn Met Glu Leu Glu Leu Asn lie His Asn Ser Lys Ala Pro Lys Lys 
595 600 605 

Asn Arg Leu Arg Arg Lys Ser Ser. Thr Arg His lie His Ala Leu Glu 
610 615 620 
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Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 - 635 640 

lie Asp Ser Cys Ser Ser Ser Glu Glu lie Lys Lys Lys Lys Tyr Asn 

645 650 655 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 . 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 

725 730 735 

Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 745 750 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 
755 760 765 

lie Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser lie Ser 
770 775 780 

Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 790 795 800 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly Leu lie His 

805 810 815 

Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 
820 825 830 

Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser lie Glu Met Glu 
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835 840 845 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu -Gin Asn Thr. Phe Lys Val Ser 
850 855 860 

Lys Arg Gin Ser Phe Ala Leu Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 87 ° 875 880 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 

885 890 895 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 
900 905 910 

Asn Glu Ser Asn He Lys Pro Val Gin Thr Val Asn He Thr Ala Gly 
915 920 . 925 

Phe Pro Val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 
930 935 940 

Ser He Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 9 6 o 

Asn Glu Thr Gly Leu He Thr Pro Asn Lys His Gly Leu Leu Gin Asn 

965 970 975 

Pro Tyr Arg He Pro Pro Leu Phe Pro He Lys Ser Phe Val Lys Thr 
980 985 990 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn He Pro Ser Thr Val Ser 
1010 1015 1020 

Thr He Ser Arg Asn Asn He Arg Glu Asn Val Phe Lys Gly Ala Ser 
1° 25 1030 1035 1040 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 

1045 1050 loss 
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Ser lie Asn Glu lie 
1060 

Gly Arg Asn Arg Gly 
1075 

Leu Gin Pro Glu Val 
1090 

His Pro Glu lie Lys 
1105 

Asn Thr Asp Phe Ser 

1125 

Met Gly Ser Ser His 
1140 

Leu Leu Asp Asp Gly 
1155 

Asp lie Lys Glu Ser 
1170 

Glu Leu Ser Arg Ser 
1185 

Gly Tyr Arg Arg Gly 

120! 



Gly Ser Ser Asp Glu Asn 
1065 

Pro Lys Leu Asn Ala Met 
1080 

Tyr Lys Gin Ser Leu Pro 
1095 

Lys Gin Glu Tyr Glu Glu 
1110 111: 

Pro Tyr Leu lie Ser Asp 

1130 

Ala Ser Gin Val Cys Ser 
1145 

Glu lie Lys Glu Asp Thr 
1160 

Ser Ala Val Phe Ser Lys 
1175 

Pro Ser Pro Phe Thr His 
1190 119! 
Ala Lys Lys Leu Glu Ser 

1210 



lie Gin Ala Glu Leu 
1070 

Leu Arg Leu Gly Val 
1085 

Gly Ser Asn Cys Lys 
1100 

Val Val Gin Thr Val 

1120 

Asn Leu Glu Gin Pro 
1135 

Glu Thr Pro Asp Asp 
1150 

Ser Phe Ala Glu Asn 
1165 

Ser Val Gin Arg Gly 
1180 

Thr His Leu Ala Gin 

1200 

Ser Glu Glu Asn Leu 
' 1215 



Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
1220 1225 1230 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 . 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
1265 1270 1275 1280 
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Gin Glu .His His Leu Ser Glu Glu Thr ~ Lys Cys Ser Ala Ser Leu Phe 

1285 1290 1295 



Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu lie Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 1360 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Glu Thr 

1365 1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp lie Leu 
1380 1385 1390 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser lie lie Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 

Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 

1445 1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 
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Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490- 1495 ' 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 

1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly lie 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
1570 1575 1580 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 i6 00 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Gly Pro Ala Ala 

1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 1660 

Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu lie 
1665 1670 1675 1530 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 

1685 1690 1695. 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly lie Ala Gly Gly Lys Trp 
1700 1705 1710 
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Val Val Ser Tyr Phe Trp Val Thr Gln~Ser lie Lys Glu Arg Lys Met 
1715 1720 1725 



Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
1745 1750 1755 1760 

Phe Arg Gly Leu Glu lie Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 

1765 1770 1775 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
1780 1785 1790 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro lie Val 

1795 1800 1805 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala lie 
1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 1840 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu lie Pro 

1845 1850 1855 

Gin lie Pro His Ser His Tyr 
1860 



INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: cDNA 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

( A ) CHROMOSOME / SEGMENT : 1 7 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: 2 
= AGCTCGCTGA GACTTCCTGG ACCCCGCACC 

4 CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT 

5 TGGATTTATC TGCTCTTCGC GTTGAAGAAG 
y TCTTAGAGTG TCCCATCTGT CTGGAGTTGA 
y ACATATTTTG CAAATTTTGC ATGC TGAAAC 
1 GTCCTTTATG TAAGAATGAT ATAACCAAAA 

AACTTGTTGA AGAGCTATTG AAAATCATTT 
ATGCAAACAG CTATAATTTT GCAAAAAAGG 
AAGTTTCTAT CATCCAAAGT ATGGGCTACA 
AACCCGAAAA TCCTTCCTTG CAGGAAACCA 
CTGTGAGAAC TCTGAGGACA AAGCAGCGGA 
AATTGGGATC TGATTCTTCT GAAGATACCG 
ATCAAGAATT GTTACAAATC ACCCCTCAAG 



EQ ID NO: 3: 

AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 
CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 
TACAAAATGT CATTAATGCT ATGCAGAAAA 
TCAAGGAACC TGTCTCCACA AAGTGTGACC 
TTCTCAACCA GAAGAAAGGG CCTTCACAGT 
GGAGCCTACA AG AAAGT AC G AGATTTAGTC 
GTGCTTTTCA GCTTGACACA GGTTTGGAGT 
AAAATAACTC TCCTGAACAT CTAAAAGATG 
GAAACCGTGC CAAAAGACTT CTACAGAGTG 
GTCTCAGTGT CCAACTCTCT AACCTTGGAA 
TACAACCTCA AAAGACGTCT GTCTACATTG 
TTAATAAGGC AACTTATTGC AGTGTGGGAG 
GAACCAGGGA TGAAATCAGT TTGGATTCTG 

59 



CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAATACTGAA CATCATCAAC 840 

• CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 90 0 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 9 60 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 102 0 

AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT 10 8 0 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 114 0 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA TAAGCAGAAA CTGCCATGCT 12 0 0 

fj CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC AC TAAAT AGC AGCATTCAGA 12 60 

I j AAGTT AATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 13 2 0 

^GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 13 80 

s AATATTCTGG TTCTTCAGAG AAAAT AG AC T TACTGGCCAG TGATCCTCAT GAGGCTTTAA 14 40 

[JTATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GAC AAAAT AT 15 0 0 

OTTGGGAAAAC CTATCGGAAG AAGGCAAGCC TCCCCAACTT AAGCCATGTA ACTGAAAATC 15 60 

TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 162 0 

AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 168 0 

CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 17 4 0 

AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 18 0 0 

CTATTCAGAA T G AG AAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 18 60 

AAACGAAAGC TGAACCTATA AG C AG C AGT A TAAGCAATAT GGAACTCGAA TTAAATATCC 19 2 0 

ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 19 80 

60 



ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC T AATTGT AC T GAATTGCAAA 
TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 
GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 
GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG CGATACTTTC CCAGAGCTGA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 
TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 
CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AG AAAGGG TT TTGCAAACTG 
AAAGATCTGT AGAGAGTAGC AGT ATTT CAT TGGTACCTGG TACTGATTAT GGCACTCAGG 
AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 
GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTCCAAAG 
ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG . ACATGAAGTT AACCACAGTC 
GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 
TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 
AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 
TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 
AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 
ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 
ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 
CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 
AAAAC TTTG A GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 



2040 

2100 

21S0 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 
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2820 
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2940 

3000 

3060 

3120 

3180 
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GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GAAGCCAGCT 
CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 
TAGGTTC CAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 
ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 
GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 
ATACAGATTT. CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 
ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 
AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 
TCCAGAAAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 
GTTACCGAAG AGGGGC CAAG AAATTAGAGT CCTCAGAAGA GAACTTATCT AGTGAGGATG 
AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 
C TACT AGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 
TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 
AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 
GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 
CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 
TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 
TGGATTCAAA CTTAGGTGAA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 
ACTGCTCAGG GCTATCCTCT CAGAGT.GACA TTTTAACCAC TCAGCAGAGG GATACCATGC 
AACATAACCT GATAAAGCTC. CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 
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ATGGGAGCCA 


GCCTTCTAAC 


AGCTACCCTT 


CCATCATAAG 


TGACTCTTCT 


GCCCTTGAGG 


444 0 


ACCTGCGAAA 


TCCAGAACAA 


AGCACATCAG 


AAAAAGCAGT 


ATTAAC TTC A 


CAGAAAAGTA 


450Q 


GTGAATACCC 


TATAAGCCAG 


AATCCAGAAG 


GCCTTTCTGC 


TGACAAGTTT 


GAGGTGTCTG 


^» J o u 


CAGATAGTTC 


T AC C AGT AAA 


AATAAAGAAC 


CAGGAGTGGA 


AAGGTCATCC 


CCTTCTAAAT 


4 9 Pi 

** o ^ u 


GCCCATCATT 


AGATGATAGG 


TGGTACATGC 


ACAGTTGCTC 


TGGGAGTCTT 


CAGAATAGAA 


4 630 


ACTACCCATC 


TCAAGAGGAG 


CTCATTAAGG 


TTGTTGATGT 


GGAGGAGCAA 


CAGCTGGAAG 


4740 


AGTCTGGGCC 


AC AC G ATTTG 


ACGGAAACAT 


CTTACTTGCC 


AAGGCAAGAT 


CTAGAGGGAA 


48 00 


CCCCTTACCT 


GGAATCTGGA 


ATCAGCCTCT 


TCTCTGATGA 


CCCTGAATCT 


GATCCTTCTG 


4 8 SO 

U w U 


AAGACAGAGC 


CCCAGAGTCA 


GCTCGTGTTG 


GCAACATACC 


AT CTTC AAC C 




4920 


AAGTTCCCCA 


ATTGAAAGTT 


GCAGAATCTG 


CCCAGAGTCC 


AGCTGCTGCT 


CATACTACTG 


4980 


ATACTGCTGG 


GTATAATGCA 


ATGGAAGAAA 


GTGTGAGCAG 


GGAGAAGCCA 


GAATTGACAG 


5040 



CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 
AATTTATGCT CGTGTACAAG TTTGC CAGAA AACACCACAT CACTTTAACT AATCTAATTA 
CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 
TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 
AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGT C AG A GGAGATGTGG 
TCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 
TCAGGGGGC T AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCA.CA GATCAACTGG 
AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 
GCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 



5100 
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5711 



TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5640 
GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 57 00 

GCCACTACTG A 

(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 

(ii) MOLECULE TYPE: protein 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT: 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val He Asn 
1 5 10 15 

Ala Met Gin Lys He Leu Glu Cys Pro He Cys Leu Glu Leu He Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His He Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 
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Lys AsnAsp lie Thr Lys Arg Ser Leu- Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Val Glu Glu Leu Leu Lys lie He Cys Ala Phe Gin Leu Asp 

85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 110 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser He He Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 

Pro Ser Leu Gin Glu Thr Ser Leu Ser Val Gin Leu Ser Asn Leu Gly 
145 150 155 i 6 o 

Thr Val Arg Thr Leu Arg Thr Lys Gin Arg He Gin Pro Gin Lys Thr 

165 170 175 

Ser Val Tyr He Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 190 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu lie Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 

Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 
225 230 235 ^ 40 

Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 

245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 
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Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
♦275 280 - 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 

Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 
305 310 315 320 

Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 

325 330 335 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 
340 345 350 

Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
355 360 365 

Asp Val Pro Trp lie Thr Leu Asn Ser Ser . lie Gin Lys Val Asn Glu 
370 375 380 

Trp Phe Ser Arg Ser Asp Glu Leu Leu Gly Ser Asp Asp Ser His Asp 
38 5 390 395 400 

Gly Glu Ser Glu Ser Asn Ala Lys Val Ala Asp Val Leu Asp Val Leu 

405 410 415 

Asn Glu Val Asp Glu Tyr Ser Gly Ser Ser Glu Lys He Asp Leu Leu 
420 425 430 

Ala Ser Asp Pro His Glu Ala Leu He Cys Lys Ser Glu Arg Val His 
435 440 445 

Ser Lys Ser Val Glu Ser Asn He Glu Asp Lys He Phe Gly Lys Thr 
450 455 460 

Tyr Arg Lys Lys Ala Ser Leu Pro Asn Leu Ser His Val Thr Glu Asn 
465 470 475 480 

Leu He He Gly Ala Phe Val Thr Glu Pro Gin He He Gin Glu Arg 
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485 490 



495 



Pro Leu Thr Asn Lys Leu Lys Arg Lys Arg Arg Pro Thr Ser Gly Leu 
500 505 510 

His Pro Glu Asp Phe He Lys Lys Ala Asp Leu Ala Val Gin Lys Thr 
515 520 525 

Pro Glu Met He Asn Gin Gly Thr Asn Gin Thr Glu Gin Asn Gly Gin 
53 0 535 540 

Val Met Asn He Thr Asn Ser Gly His Glu Asn Lys Thr Lys Gly Asp 
545 550 555 550 

Ser He Gin Asn Glu Lys Asn Pro Asn Pro He Glu Ser Leu Glu Lys 

565 570 575 

Glu Ser Ala Phe Lys Thr Lys Ala Glu Pro He Ser Ser Ser He- Ser 

580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 

595 600 605 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His He His Ala Leu Glu 
610 615 620 

Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 655 

Gin Met Pro Val Arg His Ser Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 



67 



Ala Pro Gly Ser 
705 

Phe Val Asn Pro 



Thr Val Lys Val 
740 

Ser Gly Glu Arg 
755 

lie Ser Leu Val 
770 

Leu Leu Glu Val 
785 

Cys Val Ser Gin 

Gly Cys Ser Lys 
820 

Leu Gly His Glu 
835 

Glu Ser Glu Leu 
850 

Lys Arg Gin Ser 
865 

Glu Cys Ala Thr 

Pro Lys Val Thr 
900 

Asn Glu Ser Asn 
915 



Phe Thr Lys Cys 
710 

Ser Leu Pro Arg 
725 

Ser Asn Asn Ala 

Val Leu Gin Thr 
760 

Pro Gly Thr Asp 
775 

Ser Thr Leu Gly 
790 

Cys Ala Ala Phe 
805 

Asp Asn Arg Asn 



Val Asn His Ser 
840 

Asp Ala Gin Tyr 
855 

Phe Ala Leu Phe 
870 

Phe Ser Ala His 
885 

Phe Glu Cys Glu 

lie Lys Pro Val 
920 



Ser Asn Thr Ser 
715 

Glu Glu Lys Glu 
730 

Glu Asp Pro Lys 
745 

Glu Arg Ser Val 



Tyr Gly Thr Gin 
780 

Lys Ala Lys Thr 
795 

Glu Asn Pro Lys 
810 

Asp Thr Glu Gly 
825 

Arg Glu Thr Ser 

Leu Gin Asn Thr 
860 

Ser Asn Pro Gly 
875 

Ser Gly Ser Leu 
890 

Gin Lys Glu Glu 
905 

Gin Thr Val Asn 



Glu Leu Lys Glu 
720 

Glu Lys Leu Glu 
735 

Asp Leu Met Leu 
750 

Glu Ser Ser Ser 
765 

Glu Ser lie Ser 

Glu Pro Asn Lys 
800 

Gly Leu lie His 
815 

Phe Lys Tyr Pro 
830 

lie Glu Met Glu 
845 

Phe Lys Val Ser 



Asn Ala Glu Glu 
880 

Lys Lys Gin Ser 
895 

Asn Gin Gly Lys 
910 

He Thr Ala Gly 
925 
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Phe Pro. Val Val Gly Gin Lys Asp Lys- Pro Val Asp Asn Ala Lys Cys 
930 935 940 



Ser lie Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
945 950 955 960 

Asn Glu Thr Gly Leu He Thr Pro Asn Lys His Gly Leu Leu Gin Asn 

965 970 975 

Pro Tyr Arg He Pro Pro Leu Phe Pro He Lys Ser Phe Val Lys Thr 
980 985 990 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 

Ser Pro Glu Arg Glu Met Gly Asn Glu Asn He Pro Ser Thr Val Ser 
1010 1015 1020 

Thr He Ser Arg Asn Asn He Arg Glu Asn Val Phe Lys Glu Ala Ser 
1025 1030 1035 i 0 40 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 

1045 1050 1055 

Ser He Asn Glu He Gly Ser Ser Asp Glu Asn lie Gin Ala Glu Leu 
1060 1065 1070 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 1085 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 . 1095 1100 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1110 1115 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu He Ser Asp Asn Leu Glu Gin Pro 

1125 1130 H35 
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Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 1145 H50 



Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 1160 H65 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Lys Gly 
H70 1175 1180 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 
1185 1190 1195 120 0 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 

1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
1220 1225 1230 

Lys Val Asn Asn He Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn. Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val He Leu Ala Lys Ala Ser 
1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 

1285 1290 1295 

Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu He Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 i3 60 
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Met Asp Ser Asn Leu Gly Glu Ala Ala- Ser Gly Cys Glu Ser Glu Thr 

1365 1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp He Leu 
1380 1385 139Q 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu He Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Se:r Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser He He Ser Asp Ser Ser Ala Leu Glu 
l 42 ^ 1430 1435 i 440 

Asp Leu Arg Asn Pro Glu Gin Ser Thr Ser Glu Lys Ala Val Leu Thr 

1445 1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro He Ser Gin Asn Pro Glu Gly Leu 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490 1495 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu He Lys Val Val Asp Val Glu Glu 

1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly He 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
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1570 



1575 



1580 



Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 1500 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Ser Pro Ala Ala 

1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 

Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 1660 

Val Tyr Lys Phe Ala Arg Lys His His lie Thr Leu Thr Asn Leu He 
1665 1670 1675 .1680 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 

1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly He Ala Gly Gly Lys Trp 
1700 1705 1710 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser He Lys Glu Arg Lys Met 
1715 1720 1725 

Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys lie 
1745 1750 1755 176O 

Phe Arg Gly Leu Glu He Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 

1765 1770 1775 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly. Ala Ser Val Val 
1780 1785 1790 
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Lys Glu ^ Leu Ser Ser Phe Thr Leu Gly- Thr Gly Val His Pro He Val 

1795 1800 , 18Q5 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala He 
1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 i 840 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu He Pro 

1845 1850 lass 

Gin He Pro His Ser His Tyr 
1860 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 5711 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

( A ) CHROMOSOME/ SEGMENT : 1 7 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 
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AGCTCGCTGA GACTTCCTGG ACCCCGCACC AGGCTGTGGG GTTTCTCAGA TAACTGGGCC 6 0 

CCTGCGCTCA GGAGGCCTTC ACCCTCTGCT CTGGGTAAAG TTCATTGGAA CAGAAAGAAA 12 0 

TGGATTTATC TGCTCTTCGC GTTGAAGAAG TACAAAATGT CATTAATGCT ATGCAGAAAA 18 0 

TCTTAGAGTG TCCCATCTGT CTGGAGTTGA TCAAGGAACC TGTCTCCACA AAGTGTGACC 2 40 

ACATATTTTG CAAATTTTGC ATGCTGAAAC TTCTCAACCA GAAGAAAGGG CCTTCACAGT 3 00 

GTCCTTTATG TAAGAATGAT ATAACCAAAA GGAGCCTACA AGAAAGTACG AGATTTAGTC 3 60 

AACTTGTTGA AGAGC TATTG AAAATCATTT GTGCTTTTCA GCTTGACACA GGTTTGGAGT 42 0 

ATGCAAACAG CTATAATTTT GCAAAAAAGG AAAATAACTC TCCTGAACAT CTAAAAGATG 48 0 

AAGTTTCTAT CATCCAAAGT ATGGGCTACA GAAACCGTGC CAAAAGACTT CTACAGAGTG 54 0 

AACCCGAAAA TCCTTCCTTG CAGGAAACCA GTCTCAGTGT CCAACTCTCT AACCTTGGAA 60 0 

CTGTGAGAAC TCTGAGGACA AAGCAGCGGA TACAACCTCA AAAGACGTCT GTCTACATTG 660 

AATTGGGATC TGATTCTTCT GAAGATACCG TTAATAAGGC AACTTATTGC AGTGTGGGAG 72 0 

ATCAAGAATT GTTACAAATC ACCCCTCAAG GAACCAGGGA TGAAATCAGT TTGGATTCTG 780 

CAAAAAAGGC TGCTTGTGAA TTTTCTGAGA CGGATGTAAC AAAT AC TGAA CATCATCAAC 84 0 

CCAGTAATAA TGATTTGAAC ACCACTGAGA AGCGTGCAGC TGAGAGGCAT CCAGAAAAGT 90 0 

ATCAGGGTAG TTCTGTTTCA AACTTGCATG TGGAGCCATG TGGCACAAAT ACTCATGCCA 960 

GCTCATTACA GCATGAGAAC AGCAGTTTAT TACTCACTAA AGACAGAATG AATGTAGAAA 102 0 
AGGCTGAATT CTGTAATAAA AGCAAACAGC CTGGCTTAGC AAGGAGCCAA CATAACAGAT ' 10 80 

GGGCTGGAAG TAAGGAAACA TGTAATGATA GGCGGACTCC CAGCACAGAA AAAAAGGTAG 114 0 

ATCTGAATGC TGATCCCCTG TGTGAGAGAA AAGAATGGAA T AAG C AG AAA CTGCCATGCT 12 00 
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CAGAGAATCC TAGAGATACT GAAGATGTTC CTTGGATAAC ACTAAATAGC AGCATTCAGA 
AAGTTAATGA GTGGTTTTCC AGAAGTGATG AACTGTTAGG TTCTGATGAC TCACATGATG 
GGGAGTCTGA ATCAAATGCC AAAGTAGCTG ATGTATTGGA CGTTCTAAAT GAGGTAGATG 
AATATTCTGG TTCTTCAGAG AAAATAGACT TACTGGCCAG TGATCCTCAT GAGGCTTTAA 
TATGTAAAAG TGAAAGAGTT CACTCCAAAT CAGTAGAGAG TAATATTGAA GACAAAATAT 
TTGGGAAAAC CTATCGGAAG AAGGCAAGCC . TCCCCAACTT AAGCCATGTA ACTGAAAATC 
TAATTATAGG AGCATTTGTT ACTGAGCCAC AGATAATACA AGAGCGTCCC CTCACAAATA 
AATTAAAGCG TAAAAGGAGA CCTACATCAG GCCTTCATCC TGAGGATTTT ATCAAGAAAG 

3 CAGATTTGGC AGTTCAAAAG ACTCCTGAAA TGATAAATCA GGGAACTAAC CAAACGGAGC 

5 AGAATGGTCA AGTGATGAAT ATTACTAATA GTGGTCATGA GAATAAAACA AAAGGTGATT 

4 CTATTCAGAA TGAGAAAAAT CCTAACCCAA TAGAATCACT CGAAAAAGAA TCTGCTTTCA 

6 AAACG AAAGC TGAACCTATA AGCAGCAGTA TAAGCAATAT GGAACTCGAA TTAAATATCC 
" ACAATTCAAA AGCACCTAAA AAGAATAGGC TGAGGAGGAA GTCTTCTACC AGGCATATTC 
3 ATGCGCTTGA ACTAGTAGTC AGTAGAAATC TAAGCCCACC TAATTGTACT GAATTGCAAA 

TTGATAGTTG TTCTAGCAGT GAAGAGATAA AGAAAAAAAA GTACAACCAA ATGCCAGTCA 
GGCACAGCAG AAACCTACAA CTCATGGAAG GTAAAGAACC TGCAACTGGA GCCAAGAAGA 
GTAACAAGCC AAATGAACAG ACAAGTAAAA GACATGACAG TGATACTTTC CCAGAGCTGA 
AGTTAACAAA TGCACCTGGT TCTTTTACTA AGTGTTCAAA TACCAGTGAA CTTAAAGAAT 
TTGTCAATCC TAGCCTTCCA AGAGAAGAAA AAGAAGAGAA ACTAGAAACA GTTAAAGTGT 
CTAATAATGC TGAAGACCCC AAAGATCTCA TGTTAAGTGG AGAAAGGGTT TTGCAAACTG 
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AAAGATCTGT AGAGAGTAGC AGTATTTCAC TGGTACCTGG TACTGATTAT GGCACTCAGG 2 4 60 

AAAGTATCTC GTTACTGGAA GTTAGCACTC TAGGGAAGGC AAAAACAGAA CCAAATAAAT 2 52 0 

GTGTGAGTCA GTGTGCAGCA TTTGAAAACC CCAAGGGACT AATTCATGGT TGTTC CAAAG 2 580 

ATAATAGAAA TGACACAGAA GGCTTTAAGT ATCCATTGGG ACATGAAGTT AACCACAGTC 2 640 

GGGAAACAAG CATAGAAATG GAAGAAAGTG AACTTGATGC TCAGTATTTG CAGAATACAT 27 0 0 

TCAAGGTTTC AAAGCGCCAG TCATTTGCTC TGTTTTCAAA TCCAGGAAAT GCAGAAGAGG 2 7 60 

AATGTGCAAC ATTCTCTGCC CACTCTGGGT CCTTAAAGAA ACAAAGTCCA AAAGTCACTT 2 82 0 

TTGAATGTGA ACAAAAGGAA GAAAATCAAG GAAAGAATGA GTCTAATATC AAGCCTGTAC 2 880 

3 AGACAGTTAA TATCACTGCA GGCTTTCCTG TGGTTGGTCA GAAAGATAAG CCAGTTGATA 2 94 0 

« ATGCCAAATG TAGTATCAAA GGAGGCTCTA GGTTTTGTCT ATCATCTCAG TTCAGAGGCA 3 0 00 

/! ACGAAACTGG ACTCATTACT CCAAATAAAC ATGGACTTTT ACAAAACCCA TATCGTATAC 3 0 60 

* CACCACTTTT TCCCATCAAG TCATTTGTTA AAACTAAATG TAAGAAAAAT CTGCTAGAGG 312 0 

j AAAACTTTGA GGAACATTCA ATGTCACCTG AAAGAGAAAT GGGAAATGAG AACATTCCAA 318 0 

5 GTACAGTGAG CACAATTAGC CGTAATAACA TTAGAGAAAA TGTTTTTAAA GGAGCCAGCT 32 4 0 

CAAGCAATAT TAATGAAGTA GGTTCCAGTA CTAATGAAGT GGGCTCCAGT ATTAATGAAA 3 3 00 

TAGGTTCCAG TGATGAAAAC ATTCAAGCAG AACTAGGTAG AAACAGAGGG CCAAAATTGA 3 3 60 

ATGCTATGCT TAGATTAGGG GTTTTGCAAC CTGAGGTCTA TAAACAAAGT CTTCCTGGAA 3 42 0 

GTAATTGTAA GCATCCTGAA ATAAAAAAGC AAGAATATGA AGAAGTAGTT CAGACTGTTA 34 8 0 

ATACAGATTT CTCTCCATAT CTGATTTCAG ATAACTTAGA ACAGCCTATG GGAAGTAGTC 3 540 

ATGCATCTCA GGTTTGTTCT GAGACACCTG ATGACCTGTT AGATGATGGT GAAATAAAGG 3 600 
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AAGATACTAG TTTTGCTGAA AATGACATTA AGGAAAGTTC TGCTGTTTTT AGCAAAAGCG 3 6 60 

TCCAGAGAGG AGAGCTTAGC AGGAGTCCTA GCCCTTTCAC CCATACACAT TTGGCTCAGG 3 720 

GTTACCGAAG AGGGGCCAAG AAAT TAG AG T CCTCAGAAGA GAACTTATCT AGTGAGGATG 3 7 80 

AAGAGCTTCC CTGCTTCCAA CACTTGTTAT TTGGTAAAGT AAACAATATA CCTTCTCAGT 3 840 

CTACTAGGCA TAGCACCGTT GCTACCGAGT GTCTGTCTAA GAACACAGAG GAGAATTTAT 3 9 00 

TATCATTGAA GAATAGCTTA AATGACTGCA GTAACCAGGT AATATTGGCA AAGGCATCTC 3 9 60 

AGGAACATCA CCTTAGTGAG GAAACAAAAT GTTCTGCTAG CTTGTTTTCT TCACAGTGCA 4 02 0 

GTGAATTGGA AGACTTGACT GCAAATACAA ACACCCAGGA TCCTTTCTTG ATTGGTTCTT 40 80 

3 CCAAACAAAT GAGGCATCAG TCTGAAAGCC AGGGAGTTGG TCTGAGTGAC AAGGAATTGG 4140 

= TTTCAGATGA TGAAGAAAGA GGAACGGGCT TGGAAGAAAA TAATCAAGAA GAGCAAAGCA 42 00 

jj TGGATTCAAA C TTAGGTG AA GCAGCATCTG GGTGTGAGAG TGAAACAAGC GTCTCTGAAG 42 60 

* ACTGCTCAGG GCTATCCTCT CAGAGTGACA TTTTAACCAC TCAGCAGAGG GATACCATGC 432 0 

j AACATAACCT GATAAAGCTC CAGCAGGAAA TGGCTGAACT AGAAGCTGTG TTAGAACAGC 43 80 

JATGGGAGCCA GCCTTCTAAC AGCTACCCTT CCATCATAAG TGACTCTTCT GCCCTTGAGG 44 40 

ACCTGCGAAA TCCAGAACAA AGCACATCAG AAAAAGCAGT ATTAACTTCA CAGAAAAGTA 4500 

GTGAATACCC TATAAGCCAG AATCCAGAAG GCCTTTCTGC TGACAAGTTT GAGGTGTCTG 4 5 60 

C AG AT AG TT C TACCAGTAAA AATAAAGAAC CAGGAGTGGA AAGGTCATCC CCTTCTAAAT 4 62 0 

GCCCATCATT AGATGATAGG TGGTACATGC . ACAGTTGCTC TGGGAGTCTT CAGAATAGAA 46 8 0 

ACTACCCATC TCAAGAGGAG CTCATTAAGG TTGTTGATGT GGAGGAGCAA CAGCTGGAAG 47 4 0 

AGTCTGGGCC AC AC G ATT TG ACGGAAACAT CTTACTTGCC AAGGCAAGAT CTAGAGGGAA 43 0 0 
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CCCCTTACCT GGAATCTGGA ATCAGCCTCT TCTCTGATGA CCCTGAATCT GATCCTTCTG 4 8 60 

AAGACAGAGC CCCAGAGTCA GCTCGTGTTG GCAACATACC ATCTTCAACC TCTGCATTGA 4 92 0 

AAGTTCCCCA ATTGAAAGTT GCAGAATCTG CCCAGGGTCC AGCTGCTGCT CATACTACTG 4980 

ATACTGCTGG GTATAATGCA ATGGAAGAAA GTGTGAGCAG GGAGAAGCCA GAATTGACAG 5 04 0 

CTTCAACAGA AAGGGTCAAC AAAAGAATGT CCATGGTGGT GTCTGGCCTG ACCCCAGAAG 5100 

AATTTATGCT CGTGTACAAG TTTGCCAGAA AACACCACAT CACTTTAACT AATCTAATTA 5160 

CTGAAGAGAC TACTCATGTT GTTATGAAAA CAGATGCTGA GTTTGTGTGT GAACGGACAC 522 0 

TGAAATATTT TCTAGGAATT GCGGGAGGAA AATGGGTAGT TAGCTATTTC TGGGTGACCC 52 8 0 

^AGTCTATTAA AGAAAGAAAA ATGCTGAATG AGCATGATTT TGAAGTCAGA GGAGATGTGG 53 40 

JTCAATGGAAG AAACCACCAA GGTCCAAAGC GAGCAAGAGA ATCCCAGGAC AGAAAGATCT 54 0 0 

JTCAGGGGGCT AGAAATCTGT TGCTATGGGC CCTTCACCAA CATGCCCACA GATCAACTGG 54 60 

•^AATGGATGGT ACAGCTGTGT GGTGCTTCTG TGGTGAAGGA GCTTTCATCA TTCACCCTTG 552 0 

jGCACAGGTGT CCACCCAATT GTGGTTGTGC AGCCAGATGC CTGGACAGAG GACAATGGCT 5 58 0 

5TCCATGCAAT TGGGCAGATG TGTGAGGCAC CTGTGGTGAC CCGAGAGTGG GTGTTGGACA 5 64 0 

GTGTAGCACT CTACCAGTGC CAGGAGCTGG ACACCTACCT GATACCCCAG ATCCCCCACA 57 0 0 

GCCACTACTG A 5711 
(2) INFORMATION FOR SEQ ID NO : 6 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1863 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: not relevant 
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(ii) MOLECULE TYPE: protein 



(vi) ORIGINAL SOURCE: 

(A) ORGANISM: Homo sapiens 

(B) STRAIN: BRCA1 

(viii) POSITION IN GENOME: 

(A) CHROMOSOME/ SEGMENT : 17 

(B) MAP POSITION: 17q21 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 

Met Asp Leu Ser Ala Leu Arg Val Glu Glu Val Gin Asn Val lie Asn 
1 5 10 15 

Ala Met Gin Lys lie Leu Glu Cys Pro lie Cys Leu Glu Leu lie Lys 
20 25 30 

Glu Pro Val Ser Thr Lys Cys Asp His lie Phe Cys Lys Phe Cys Met 
35 40 45 

Leu Lys Leu Leu Asn Gin Lys Lys Gly Pro Ser Gin Cys Pro Leu Cys 
50 55 60 

Lys Asn Asp lie Thr Lys Arg Ser Leu Gin Glu Ser Thr Arg Phe Ser 
65 70 75 80 

Gin Leu Val Glu Glu Leu Leu Lys lie lie Cys Ala Phe Gin Leu Asp 

85 90 95 

Thr Gly Leu Glu Tyr Ala Asn Ser Tyr Asn Phe Ala Lys Lys Glu Asn 
100 105 110 

Asn Ser Pro Glu His Leu Lys Asp Glu Val Ser lie lie Gin Ser Met 
115 120 125 

Gly Tyr Arg Asn Arg Ala Lys Arg Leu Leu Gin Ser Glu Pro Glu Asn 
130 135 140 
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Pro Ser Leu Gin Glu Thr Ser Leu Sc 

*-v-^ w^j. non i-itsu 

160 



Ser- Val Gin Leu Ser Asn Leu Gly 
145 150 . 155 



Thr Val Arg Thr Leu Arg Thr Lys Glh Arg He Gin Pro Gin Lys Thr 

165 170 175 

Ser Val Tyr He Glu Leu Gly Ser Asp Ser Ser Glu Asp Thr Val Asn 
180 185 i 90 

Lys Ala Thr Tyr Cys Ser Val Gly Asp Gin Glu Leu Leu Gin He Thr 
195 200 205 

Pro Gin Gly Thr Arg Asp Glu He Ser Leu Asp Ser Ala Lys Lys Ala 
210 215 220 



Ala Cys Glu Phe Ser Glu Thr Asp Val Thr Asn Thr Glu His His Gin 

240 



225 230 235 



Pro Ser Asn Asn Asp Leu Asn Thr Thr Glu Lys Arg Ala Ala Glu Arg 

245 250 255 

His Pro Glu Lys Tyr Gin Gly Ser Ser Val Ser Asn Leu His Val Glu 
260 265 270 

Pro Cys Gly Thr Asn Thr His Ala Ser Ser Leu Gin His Glu Asn Ser 
275 280 285 

Ser Leu Leu Leu Thr Lys Asp Arg Met Asn Val Glu Lys Ala Glu Phe 
290 295 300 



Cys Asn Lys Ser Lys Gin Pro Gly Leu Ala Arg Ser Gin His Asn Arg 

320 



305 310 315 



Trp Ala Gly Ser Lys Glu Thr Cys Asn Asp Arg Arg Thr Pro Ser Thr 

325 330 335 

Glu Lys Lys Val Asp Leu Asn Ala Asp Pro Leu Cys Glu Arg Lys Glu 

340 345 350 
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Trp Asn Lys Gin Lys Leu Pro Cys Ser Glu Asn Pro Arg Asp Thr Glu 
355 360 - 3 6 5 



Asp Val Pro Trp lie 
370 

Trp Phe Ser Arg Ser 
385 

Gly Glu Ser Glu Ser 

405 

Asn Glu Val Asp Glu 
420 

Ala Ser Asp Pro His 
435 

Ser Lys Ser Val Glu 
450 

Tyr Arg Lys Lys Ala 
465 

Leu lie lie Gly Ala 

485 

Pro Leu Thr Asn Lys 

- 500 

His Pro Glu Asp Phe 
515 

Pro Glu Met lie Asn 
530 

Val Met Asn lie Thr 
545 

Ser lie Gin Asn Glu 



Thr Leu Asn Ser Ser lie 
375 

Asp Glu Leu Leu Gly Ser 
390 395 

Asn Ala Lys Val Ala Asp 

410 

Tyr Ser Gly Ser Ser Glu 
425 

Glu Ala Leu lie Cys Lys 
440 

Ser Asn lie Glu Asp Lys 
455 

Ser Leu Pro Asn Leu Ser 
470 475 

Phe Val Thr Glu Pro Gin 

490 

Leu Lys Arg Lys Arg Arg 
505 

lie Lys Lys Ala Asp Leu 
520 

Gin Gly Thr Asn Gin Thr 
535 

Asn Ser Gly His Glu Asn 
550 555 

Lys Asn Pro Asn Pro lie 



Gin Lys Val Asn Glu 
3 80 

Asp Asp Ser His Asp 

400 

Val Leu Asp Val Leu 
4*15 

Lys lie Asp Leu Leu 
430 

Ser Glu Arg Val His 
445 

lie Phe Gly Lys Thr 
4 60 

His Val Thr Glu Asn 

. 480 

lie lie Gin Glu Arg 
495 

Pro Thr Ser Gly Leu 
510 

Ala Val Gin Lys Thr 
525 

Glu Gin Asn Gly Gin 

540 . 

Lys Thr Lys Gly Asp 

560 

Glu Ser Leu Glu Lys 
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565 . 570 



575 



Glu Ser Ala Phe Lys Thr Lys Ala Glu -Pro lie Ser Ser Ser He Ser 

580 585 590 

Asn Met Glu Leu Glu Leu Asn He His Asn Ser Lys Ala Pro Lys Lys 
595 600 60S 

Asn Arg Leu Arg Arg Lys Ser Ser Thr Arg His He His Ala Leu Glu 
610 615 620 

Leu Val Val Ser Arg Asn Leu Ser Pro Pro Asn Cys Thr Glu Leu Gin 
625 630 635 640 

He Asp Ser Cys Ser Ser Ser Glu Glu He Lys Lys Lys Lys Tyr Asn 

645 650 655 

Gin Met Pro Val Arg His Ser . Arg Asn Leu Gin Leu Met Glu Gly Lys 
660 665 670 

Glu Pro Ala Thr Gly Ala Lys Lys Ser Asn Lys Pro Asn Glu Gin Thr 
675 680 685 

Ser Lys Arg His Asp Ser Asp Thr Phe Pro Glu Leu Lys Leu Thr Asn 
690 695 700 

Ala Pro Gly Ser Phe Thr Lys Cys Ser Asn Thr Ser Glu Leu Lys Glu 
705 710 715 720 

Phe Val Asn Pro Ser Leu Pro Arg Glu Glu Lys Glu Glu Lys Leu Glu 

725 730 735 

Thr Val Lys Val Ser Asn Asn Ala Glu Asp Pro Lys Asp Leu Met Leu 
740 745 750 

Ser Gly Glu Arg Val Leu Gin Thr Glu Arg Ser Val Glu Ser Ser Ser 
755 760 765 

He Ser Leu Val Pro Gly Thr Asp Tyr Gly Thr Gin Glu Ser He Ser 
770 775 780 
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Leu Leu Glu Val Ser Thr Leu Gly Lys Ala Lys Thr Glu Pro Asn Lys 
785 - 790 • 795 goo 

Cys Val Ser Gin Cys Ala Ala Phe Glu Asn Pro Lys Gly. Leu He His 

805 810 815 

Gly Cys Ser Lys Asp Asn Arg Asn Asp Thr Glu Gly Phe Lys Tyr Pro 
820 825 8 3o 

Leu Gly His Glu Val Asn His Ser Arg Glu Thr Ser He Glu Met Glu 
835 840 845 

Glu Ser Glu Leu Asp Ala Gin Tyr Leu Gin Asn Thr Phe Lys Val Ser 
850 855 860 

Lys Arg Gin Ser Phe Ala Leu Phe Ser Asn Pro Gly Asn Ala Glu Glu 
865 870 875 8 80 

Glu Cys Ala Thr Phe Ser Ala His Ser Gly Ser Leu Lys Lys Gin Ser 

885 890 895 

Pro Lys Val Thr Phe Glu Cys Glu Gin Lys Glu Glu Asn Gin Gly Lys 
900 905 9io 

Asn Glu Ser Asn He Lys Pro Val Gin Thr Val Asn He Thr Ala Gly 
915 920 925 

Phe Pro Val Val Gly Gin Lys Asp Lys Pro Val Asp Asn Ala Lys Cys 
930 935 940 

Ser He Lys Gly Gly Ser Arg Phe Cys Leu Ser Ser Gin Phe Arg Gly 
94 5 950 955 960 

Asn Glu Thr Gly Leu He Thr Pro Asn Lys His Gly Leu Leu Gin Asn 

965 970 975 

Pro Tyr Arg He Pro Pro Leu Phe Pro He Lys Ser Phe Val Lys Thr 
980 985 990 

Lys Cys Lys Lys Asn Leu Leu Glu Glu Asn Phe Glu Glu His Ser Met 
995 1000 1005 
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Ser Pro.Glu Arg Glu Met Gly Asn Glu- Asn He Pro Ser Thr Val Ser 
1010 1015 , .- 1020 

Thr He Ser Arg Asn Asn He Arg Glu Asn Val Phe Lys Gly Ala Ser 
1025 1030 1035 1040 

Ser Ser Asn He Asn Glu Val Gly Ser Ser Thr Asn Glu Val Gly Ser 

1° 45 1050 1055 

Ser He Asn Glu He Gly Ser Ser Asp Glu Asn He Gin Ala Glu Leu 
1060 1065 1070 

Gly Arg Asn Arg Gly Pro Lys Leu Asn Ala Met Leu Arg Leu Gly Val 
1075 1080 108 5 

Leu Gin Pro Glu Val Tyr Lys Gin Ser Leu Pro Gly Ser Asn Cys Lys 
1090 1095 iioo 

His Pro Glu He Lys Lys Gin Glu Tyr Glu Glu Val Val Gin Thr Val 
1105 1110 H15 1120 

Asn Thr Asp Phe Ser Pro Tyr Leu He Ser Asp Asn Leu Glu Gin Pro 

H25 1130 

Met Gly Ser Ser His Ala Ser Gin Val Cys Ser Glu Thr Pro Asp Asp 
1140 H45 1150 

Leu Leu Asp Asp Gly Glu He Lys Glu Asp Thr Ser Phe Ala Glu Asn 
1155 H60 ii 6 5 

Asp He Lys Glu Ser Ser Ala Val Phe Ser Lys Ser Val Gin Arg Gly 
H" 70 1175 H80 

Glu Leu Ser Arg Ser Pro Ser Pro Phe Thr His Thr His Leu Ala Gin 

1185 1190 iiqc 

1195 1200 

Gly Tyr Arg Arg Gly Ala Lys Lys Leu Glu Ser Ser Glu Glu Asn Leu 

1205 1210 1215 

Ser Ser Glu Asp Glu Glu Leu Pro Cys Phe Gin His Leu Leu Phe Gly 
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1220 1225 1230 

Lys Val Asn Asn lie Pro Ser Gin Ser Thr Arg His Ser Thr Val Ala 
1235 1240 1245 

Thr Glu Cys Leu Ser Lys Asn Thr Glu Glu Asn Leu Leu Ser Leu Lys 
1250 1255 1260 

Asn Ser Leu Asn Asp Cys Ser Asn Gin Val lie Leu Ala Lys Ala Ser 
1265 1270 1275 1280 

Gin Glu His His Leu Ser Glu Glu Thr Lys Cys Ser Ala Ser Leu Phe 

1285 1290 1295 

Ser Ser Gin Cys Ser Glu Leu Glu Asp Leu Thr Ala Asn Thr Asn Thr 
1300 1305 1310 

Gin Asp Pro Phe Leu He Gly Ser Ser Lys Gin Met Arg His Gin Ser 
1315 1320 1325 

Glu Ser Gin Gly Val Gly Leu Ser Asp Lys Glu Leu Val Ser Asp Asp 
1330 1335 1340 

Glu Glu Arg Gly Thr Gly Leu Glu Glu Asn Asn Gin Glu Glu Gin Ser 
1345 1350 1355 i3 60 

Met Asp Ser Asn Leu Gly Glu Ala Ala Ser Gly Cys Glu Ser Glu Thr 

1365 1370 1375 

Ser Val Ser Glu Asp Cys Ser Gly Leu Ser Ser Gin Ser Asp He Leu 
1380 1385 1390 

Thr Thr Gin Gin Arg Asp Thr Met Gin His Asn Leu lie Lys Leu Gin 
1395 1400 1405 

Gin Glu Met Ala Glu Leu Glu Ala Val Leu Glu Gin His Gly Ser Gin 
1410 1415 1420 

Pro Ser Asn Ser Tyr Pro Ser He He Ser Asp Ser Ser Ala Leu Glu 
1425 1430 1435 1440 
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Asp Leu Arg Asn Pro Glu Gin Ser Thr- Ser Glu Lys Ala Val Leu Thr 

1445 -1450 1455 

Ser Gin Lys Ser Ser Glu Tyr Pro lie Ser Gin Asn Pro Glu Gly Leu ' 
1460 1465 1470 

Ser Ala Asp Lys Phe Glu Val Ser Ala Asp Ser Ser Thr Ser Lys Asn 
1475 1480 1485 

Lys Glu Pro Gly Val Glu Arg Ser Ser Pro Ser Lys Cys Pro Ser Leu 

1490 1495 1500 

Asp Asp Arg Trp Tyr Met His Ser Cys Ser Gly Ser Leu Gin Asn Arg 
1505 1510 1515 1520 

Asn Tyr Pro Ser Gin Glu Glu Leu lie Lys Val Val Asp Val Glu Glu 

1525 1530 1535 

Gin Gin Leu Glu Glu Ser Gly Pro His Asp Leu Thr Glu Thr Ser Tyr 
1540 1545 1550 

Leu Pro Arg Gin Asp Leu Glu Gly Thr Pro Tyr Leu Glu Ser Gly lie 
1555 1560 1565 

Ser Leu Phe Ser Asp Asp Pro Glu Ser Asp Pro Ser Glu Asp Arg Ala 
1570 1575 1580 

Pro Glu Ser Ala Arg Val Gly Asn lie Pro Ser Ser Thr Ser Ala Leu 
1585 1590 1595 1600 

Lys Val Pro Gin Leu Lys Val Ala Glu Ser Ala Gin Gly Pro Ala Ala 

1605 1610 1615 

Ala His Thr Thr Asp Thr Ala Gly Tyr Asn Ala Met Glu Glu Ser Val 
1620 1625 % 1630 

Ser Arg Glu Lys Pro Glu Leu Thr Ala Ser Thr Glu Arg Val Asn Lys 
1635 1640 1645 
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Arg Met Ser Met Val Val Ser Gly Leu Thr Pro Glu Glu Phe Met Leu 
1650 1655 . 1660 

Val Tyr Lys Phe Ala Arg Lys His His He Thr Leu Thr Asn Leu He 
1665 1670 1675 1680 

Thr Glu Glu Thr Thr His Val Val Met Lys Thr Asp Ala Glu Phe Val 

1685 1690 1695 

Cys Glu Arg Thr Leu Lys Tyr Phe Leu Gly He Ala Gly Gly Lys Trp 
1700 1705 1710 • 

Val Val Ser Tyr Phe Trp Val Thr Gin Ser He Lys Glu Arg Lys Met 
1715 1720 1725 

Leu Asn Glu His Asp Phe Glu Val Arg Gly Asp Val Val Asn Gly Arg 
1730 1735 1740 

Asn His Gin Gly Pro Lys Arg Ala Arg Glu Ser Gin Asp Arg Lys He 
1745 1750 1755 1760 

Phe Arg Gly Leu Glu He Cys Cys Tyr Gly Pro Phe Thr Asn Met Pro 

1765 1770 1775 

Thr Asp Gin Leu Glu Trp Met Val Gin Leu Cys Gly Ala Ser Val Val 
1780 1785 . 1790 

Lys Glu Leu Ser Ser Phe Thr Leu Gly Thr Gly Val His Pro He Val 

1795 1800 1805 

Val Val Gin Pro Asp Ala Trp Thr Glu Asp Asn Gly Phe His Ala He 
1810 1815 1820 

Gly Gin Met Cys Glu Ala Pro Val Val Thr Arg Glu Trp Val Leu Asp 
1825 1830 1835 i 8 40 

Ser Val Ala Leu Tyr Gin Cys Gin Glu Leu Asp Thr Tyr Leu lie Pro 

1845 1850 1355 

Gin He Pro His Ser His Tyr 
1860 
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(2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 
(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

<vi) ORIGINAL SOURCE: 

(B) STRAIN: 2F primer 



(xi) SEQUENCE DESCRIPTION: SEQ. ID NO : 7 : 
GAAGTTGTCA TTTTATAAAC CTTT 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
TGTCTTTTCT TCCCTAGTAT GT 
(2) INFORMATION FOR SEQ ID NO : 9 : 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 3F primer 



(xi) SEQUENCE. DESCRIPTION: SEQ ID NO: 

TCCTGACACA GCAGACATTT A 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 3R primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 
TTGGATTTTC GTTCTCACTT A 
(2) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 5F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTCTTAAGGG CAGTTGTGAG 

4 (2) INFORMATION FOR SEQ ID NO: 12: 

f| (i) SEQUENCE CHARACTERISTICS: 

J (A) LENGTH: 20 base pairs 

~ (B) TYPE: nucleic acid 

jl (C) STRANDEDNESS: not relevant 

LJ (D) TOPOLOGY: linear 

J (ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 5R-M13* primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTCCTACTGT GGTTGCTTCC 

(2) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 6/7F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTTATTTTAG TGTCCTTAAA AGG 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 
(B) STRAIN: 6R 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTTCATGGAC AGCACTTGAG TG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 7F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CACAACAAAG AGCATACATA GGG 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 6/7R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TCGGGTTCAC TCTGTAGAAG 

(2) INFORMATION FOR SEQ ID NO : 17 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



(vi) ORIGINAL SOURCE: 

(B) STRAIN: 8F1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

TTCTCTTCAG GAGGAAAAGC A 

(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 8R1 primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GCTGCCTACC ACAAATACAA A 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: 9F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CCACAGTAGA TGCTCAGTAA ATA 

(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not 'relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 9R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TAGGAAAATA CCAGCTTCAT AGA 

(2) INFORMATION FOR SEQ ID NO : 2 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY:' linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE; 

(B) STRAIN: 10F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGTCAGCTT TCTGTAATCG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 10R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTATCTACCC ACTCTCTTCT TCAG 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C> STRANDEDNESS: not relevant 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 
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(B) STRAIN: 11AF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CCACCTCCAA GGTGTATCA 

(2) INFORMATION FOR SEQ ID NO:24: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11AR primer 



4 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

j TGTTATGTTG GCTCCTTGCT 

J (2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii)' MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11BF1 primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CACTAAAGAC AGAATGAATC TA 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11BR1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

4 GAAGAACCAG AATATTCATC TA 

~ t (2) INFORMATION FOR SEQ ID NO: 27: 
3 (i) SEQUENCE CHARACTERISTICS: 

3 (A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11CF1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
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TGATGGGGAG TCTGAATCAA 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11CR1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCTGCTTTCT TGATAAAATC CT 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11DF1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
AGCGTCCCCT CACAAATAAA 
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(2) INFORMATION FOR SEQ ID NO ; 3 0 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11DR1 primer 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO 
3 TCAAGCGCAT GAATATGCCT 

= (2) INFORMATION FOR SEQ ID NO: 31: 

/5 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 
* (B) TYPE: nucleic acid 

y (C) STRANDEDNESS: not relevant 

~l (D) TOPOLOGY: linear 

3 (ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11EF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTATAAGCAA TATGGAACTC GA 
(2) INFORMATION FOR SEQ ID NO: 3 2.: 
( i ) SEQUENCE CHARACTERI STICS : 
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(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: HER primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
TTAAGTTCACT GGTATTTGAA CA 

(2) INFORMATION FOR SEQ ID NO: 33: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11FF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
G AC AGC GAT A CTTTCCCAGA 

(2) INFORMATION FOR SEQ ID NO : 3 4 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 



(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11FR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGAACAACC ATGAATTAGT C 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11GF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GGAAGTTAGC ACTCTAGGGA 

(2) INFORMATION FOR SEQ ID NO: 36: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

■(B) STRAIN: 11GR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GCAGTGATAT TAACTGTCTG TA 
( 2 ) INFORMATION FOR SEQ ID NO : 3 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) ' 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11HF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TGGGTCCTTA AAGAAACAAA GT 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE; DNA (genomic) 



(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11HR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCAGGTGACA TTGAATCTTC C 

(2) INFORMATION FOR SEQ ID NO: 39: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11 IF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
CCACTTTTTC CCATCAAGTC A 
(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) - 
(vi) ORIGINAL SOURCE: 
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(B) 



STRAIN: 



11IR 



primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
TCAGGATGCT TACAATTACT TC 
(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

( i i ) MOLECULE TYPE : DNA ( genomic ) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11JF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 

CAAAATTGAA TGCTATGCTT AGA 

(2) INFORMATION FOR SEQ ID NO: 42: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11JR primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



TCGGTAACCC TGAGCCAAAT 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 0 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 11KF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GCAAAAGCGT CCAGAAAGGA 
(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs' 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 11KR-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
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TATTTGCAGT CAAGTCTTCC AA 

(2) INFORMATION FOR SEQ ID NO: 45: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNE SS : not relevant 
{ D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

.(B)' STRAIN: 11LF-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
GTAATATTGG CAAAGGCATC T 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 11LR primer 



(xi) SEQUENCE .DESCRIPTION: SEQ ID NO: 
TAAAATGTGC TCCCCAAAAG CA 
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(2) INFORMATION FOR SEQ ID NO: 47: 



(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 12F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GTCCTGCCAA TGAGAAGAAA 

(2) INFORMATION FOR SEQ ID NO: 48: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 12R primer 



(xi) SEQUENCE DESCRIPTION: SEQ . ID NO: 
TGTCAGCAAA CCTAAGAATG T 
(2) INFORMATION FOR SEQ ID NO: 49: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 13F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
AATGGAAAGC TTCTCAAAGT A 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 13R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

ATGTTGGAGC TAGGTCCTTA C 

(2) INFORMATION FOR SEQ ID NO: 51: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
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(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 14F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
CTAACCTGAA TTATCACTAT CA 
(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(vi) ORIGINAL SOURCE: 
(B) STRAIN: 14R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
GTGTATAAAT GCCTGTATGC A 
INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: ' linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 15F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGGCTGCCCA GGAAGTATG 

(2) INFORMATION FOR SEQ ID NO: 54: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 15R primer 



3 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

AACCAGAATA TCTTTATGTA GGA 
(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE : 

(B) STRAIN: 16F primer 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
AATTCTTAAC AGAGACCAGA AC 
(2) 'INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

3 (ii) MOLECULE TYPE: DNA (genomic) 

f (vi) ORIGINAL SOURCE: 

1 (B) STRAIN: 16R primer 



U (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

3LAAACTCTTT CCAGAATGTT GT 

(2) INFORMATION FOR SEQ ID NO: 57: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 17F primer 
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(xi) SEQUENCE. DESCRIPTION: SEQ ID NO 
GTGTAGAACG TGCAGGATTG 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 17R primer 



^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

y?CGCCTCATG TGGTTTTA 

52) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 18F primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GGCTCTTTAG CTTCTTAGGA C 

(2) INFORMATION FOR SEQ ID NO: 60: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 18R primer 



^ (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

"GAGACCATTT TCCCAGCATC 

l£2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 19F primer 



(xi) SEQUENCE DESCRIPTION': SEQ ID NO: 
CTGTCATTCT TCCTGTGCTC 
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(xi)' SEQUENCE DESCRIPTION: SEQ ID NO 
GTGTAGAACG TGCAGGATTG 
(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 

[ ? (vi) ORIGINAL SOURCE: 

Cf (B) STRAIN: 17R primer 



fLJ (xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCGCCTCATG TGGTTTTA 

|S) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 18F primer 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 2 OR primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
iGpGAATCCAA ATTACACAGC 

^Cfe) INFORMATION FOR SEQ ID NO: 65: 

Cil (i) SEQUENCE CHARACTERISTICS: 

^ (A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
U ( C ) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

□ (ii) MOLECULE TYPE: DNA (genomic) 

( vi ) ORIGINAL SOURCE : 

(B) STRAIN: 2 IF primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

AAGCTCTTCC TTTTTGAAAG TC 

(2) INFORMATION FOR SEQ ID NO: 66: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 21R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GTAGAGAAAT AGAATAGCCT CT 

(2) INFORMATION FOR SEQ ID NO: 67: 

3 (i) SEQUENCE CHARACTERISTICS: 

71 (A) LENGTH: 20 base pairs 

: * (B) TYPE: nucleic acid 

T* (C) STRANDEDNESS: not relevant 

iji (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

j (vi) ORIGINAL SOURCE: 
□ (B) STRAIN: 22F primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TCCCATTGAG AGGTC TTGCT 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 22R primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

GAGAAGACTT CTGAGGCTAC 

(2) INFORMATION FOR SEQ ID NO : 69 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : not relevant 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 23F-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO 
TGAAGTGACA GTTCCAGTAG T 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

■(C) STRANDEDNESS: not relevant 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(vi) ORIGINAL SOURCE: 

(B) STRAIN: 23R-1 primer 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 
CATTTTAGCC ATTCATTCAA CAA 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 
fa% (D) TOPOLOGY: linear 

^ (ii) MOLECULE TYPE: DNA (genomic) 

fn (vi) ORIGINAL SOURCE: 

y_ (B) STRAIN: 2 4F primer 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 7 

ATGAATTGAC AC T AATCTCT GC 

(2) INFORMATION FOR SEQ ID NO: 72: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: not relevant 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic)' 

(vi) ORIGINAL SOURCE: 

(B) STRAIN: 24R primer 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
GTAGCCAGGA CAGTAGAAGG A 
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